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Description 

Background of the Invention 

[0001] The estinnated 50.000-100.000 genes scat- 
tered along the human chronnosomes offer tremendous 
promise for the understanding, diagnosis, and treatment 
of human diseases. In addition, probes capable of spe- 
cifically hybridizing to loci distributed throughout the hu- 
man genome find applications in the construction of high 
resolution chromosome maps and in the identification 
of individuals. 

[0002] In the past, the characterization of even a sin- 
gle human gene was a painstaking process, requiring 
years of effort. Recent developments in the areas of 
cloning vectors, DNA sequerrcing, and computer tech- 
nology have merged to greatly accelerate the rate at 
which human genes can be isolated, sequenced, 
mapped, and characterized. Cloning vectors such as 
yeast artificial chromosomes (YACs) and bacterial arti- 
ficial chromosomes (BACs) are able to accept DNA in- 
serts ranging from 300 to 1000 kilobases (kb) or 
100-400 kbin length respectively, thereby facilitating the 
manipulation and ordering of DNA sequences distribut- 
ed over great distances on the human chromosomes. 
Automated DNA sequencing machines permit the rapid 
sequencing of human genes. Bioinformatics software 
enables the comparison of nucleic acid and protein se- 
quences, thereby assisting in the characterization of hu- 
man gene products. 

[0003] Currently, two different approaches are being 
pursued for identifying and characterizing the genes dis- 
tributed along the human genome. In one approach, 
large fragments of genomic DNA are isolated, cloned, 
and sequenced. Potential open reading frames in these 
genomic sequences are identified using bioinformatics 
software. However, this approach entails sequencing 
large stretches of human DNA which do not encode pro- 
teins in order to find the protein encoding sequences 
scattered throughout the genome. In additk^n to requir- 
ing extensive sequencing, the bioinformatics software 
may mischaracterize the genomic sequences obtained. 
Thus, the software may produce false positives in which 
non-coding DNA is mischaracterized as coding DNA or 
false negatives in which coding DNA is mislabeled as 
non-coding DNA. 

[0004] An alternative approach takes a more direct 
route to identifying and characterizing human genes. In 
this approach, complementary DNAs (cDNAs) are syn- 
thesized from isolated messenger RNAs (mRNAs) 
which encode human proteins. Using this approach, se- 
quencing is only performed on DNA which is derived 
from protein coding portions of the genome. Often, only 
short stretch s of th cDNAs are s qu nc d to obtain 
sequences called expr ss d sequenc tags (ESTs), 
TheESTsmayth nb us d to isolate or purify xtended 
cDNAs which include sequences adjacent to the EST 
sequenc s. The extended cDNAs nnay contain all of the 



sequ nc of the EST which was used to obtain them or 
only a portion of the sequence of the EST which was 
used to obtain them. In addition, the extended cDNAs 
may contain the full coding sequence of the gene from 

5 which the EST was derived or, alternatively, the extend- 
ed cDNAs may include portions of the coding sequence 
of the gene from which the EST was derived. It will be 
appreciated that there may be several extended cDN As 
which include the EST sequence as a result of alternate 

10 splicing or the activity of alternative promoters. Alterna- 
tively, ESTs having partially overlapping sequences nr^y 
be identified and contigs comprising the consensus se- 
quences of the overlapping ESTs may be identified. 
[0005] In the past, these short EST sequences were 

15 often obtained from oligcndT primed cDNA libraries. Ac- 
cordingly, they mainly corresponded to the 3' untrans- 
lated region of the mRN A. In part, the prevalence of EST 
sequences derived from the 3' end of the mRNA is a 
result of the fact that typical techniques for obtaining cD- 

20 NAs, are not well suited for isolating cDNA sequenc s 
derived from the 5' ends of mRNAs. (Adams et al., Na- 
ture 377:3-174, 1996, Hillier et al., Genome Res. 6: 
807-B28, 1996). 

[0006] In addition, in those reported instances wh re 

2S longer cDNA sequences have been obtained, the re- 
ported sequences typically correspond to coding s - 
quences and do not include the full 5' untranslated r - 
gion (5'UTR) of the mRNA from which the cDNA is d - 
rived. 5'UTRs are often involved in the regulation of 

30 gene expression, by affecting either the stability or 
translation of mRNAs. Indeed. 5'UTRs may contain sev- 
- era! features known to affect the initiation of translation: 
(i) the distance between the cap. structure and the initi- 
ation codon, (ii) the presence of cis-acting elements 

35 which may be either linear sequences such as polypy- 
rimidine tracts (Kaspar et al, J. Biol. Chem. 267, 
508-514, 1992; Severson ef a/., Eur J Biochem229: 
426-32, 1995) or secondary structures such as IREs 
(Rouault and Klausner, Cufr Top Cell Regal 35:1-19, 

40 1 997), and (iii) upstream open reading frames or uORFs 
(Geballe and Morris, Trends Biochem Sc/ 19:159-64. 
1994). Thus, regulation of gene expression may be 
achieved through the use of alternative 5*UTRs. For in- 
stance, the translation of the tissue inhibitor of metallo- 

45 protease mRNA is enhanced in mitogenically activated 
cells through modification of the start codon of an uORF 
in its 5'UTR using an alternative promoter (Waterhouse 
ef a/, J Biol Chem 265:5585-9. 1990). Furthermore, 
modification of 5*UTR through mutation, insertion or 

50 translocation events may even be implied in pathogen- 
esis. For instance, the fragile X syndrome, the most 
common cause of inherited mental retardation, is partly 
due to an insertion of multiple CGG trinucleotides in the 
5*UTR of th fragile X mRNA r suiting in th inhibition 

55 of prot in synthesis via ribosome stalling (F ng et al, 
Sc/enca 268:731-4, 1995). An ab rrant mutatk>n in r - 
gions of the 5'UTR known to inhibit translation of the pro- 
to-oncogene c-myc was shown to result in upregulation 
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of C-myc protein levels in cells derived from patients 
with multipl myelomas (Willis et a/, Curr Top Microbiol 
/mmuno/ 224:269-76. 1997). However, the use of oligo- 
dT primed cDNA libraries does not allow the isolation of 
complete 5'UTRs since such obtained incomplete se- 
quences rray not include the first exon of the mRNA, 
particularly in situations where the first exon is short. 
Furthermore, they may not include some exons. often 
short ones, which are located upstream of splicing sites. 
Thus, there is a need to obtain sequences derived from 
the'S' ends of mRNAs. 

[0007] While many sequences derived from human 
chromosomes have practical applications, approaches 
based on the identification and characterization of those 
chromosomal sequences which encode a protein prod- 
uct are particularly relevant to diagnostic and therapeu- 
tic uses. In some instances, the sequences used in such 
therapeutic or diagnostic techniques may be sequences 
which encode proteins which are secreted from the cell 
in Which they are synthesized, as well as the secreted 
proteins themselves, are particularly valuable as poten- 
tial therapeutic agents. Such proteins are often involved 
in cell to cell communication and may be responsible for 
producing a clinically relevant response in their target 
cells. In fact, several secretory proteins, including tissue 
plasminogen activator, G-CSF, GM-CSF, erythropoietin, 
human growth honmone, insulin, interferon-a, interfer- 
on-p, interferon-Y, and interleukin-2, are currently in clin- 
ical use. These proteins are used to treat a wide range 
of conditions, including acute myocardial infarction, 
acute ischemic stroke, anemia, diabetes, growth hor- 
mone deficiency, hepatitis, kidney carcinoma, chemo- 
therapy-induced neutropenia and multiple sclerosis. For 
these reasons, extended cDNAs encoding secreted 
proteins or portions thereof represent a valuable source 
of therapeutic agents. Thus, there is a need for the iden- 
tification and characterization of secreted proteins and 
the nucleic acids encoding them. 
[0008] In addition to being therapeutically useful 
themselves, secretory proteins include short peptides, 
called signal peptides, at their aminotemnini which direct 
their secretion. These signal peptides are encoded by 
the signal sequences located at the 5' ends of the coding 
sequences of genes encoding secreted proteins. These 
signal peptides can be used to direct the extracellular 
secretion of any protein to which they are operably 
linked. In addition, portions of the signal peptides called 
membrane-translocating sequences, may also be used 
to direct the intracellular import of a peptide or protein 
of interest. This may prove beneficial in gene therapy 
strategies in which it is desired to deliver a particular 
gene product to cells other than the cell in which it is 
produced. Signal sequences encoding signal peptides 
also find application in simplifying prot in purification 
techniques. In such applications, th xtracellular se- 
cretion of th desired protein greatly facilitates purifica- 
tion by reducing the number of undesired proteins from 
which the desir d protein must be sel cted. Thus, there 



exists a need to identify and characterize the 5' portions 
of the genes for seer tory proteins which encode signal 
peptid s. 

[0009] Sequences coding for non-secreted proteins 

5 may also find application as therapeutics or diagnostics. 
In particular, such sequences may be used to determine 
whether an individual is likely to express a detectable 
phenotype, such as a disease, as a consequence of a 
mutation in the coding sequence for a non-secreted pro- 

10 tein or for a secreted protein. In instances where the in- 
dividual is at risk of suffering from a disease or other 
undesirable phenotype as a result of a mutation in such 
a coding sequence, the undesirable phenotype may be 
corrected by introducing a normal coding sequence us- 
ing gene therapy. Alternatively, if the undesirable phe- 
notype results from overexpression of the protein en- 
coded by the coding sequence, expressbn of the pro- 
tein may be reduced using antisense or triple helix 
based strategies. 

20 [0010] The secreted or non-secreled human polypep- 
tides encoded by the coding sequences may also b 
used as therapeutics by administering them directly to 
an individual having a condition, such as a disease, re- 
sulting from a mutation in the sequence encoding the 

2S polypeptide. In such an instance, the condition can be 
cured or ameliorated by administering the polypeptide 
to the individual. 

[0011] In addition, the secreted or non-secreted hu- 
man polypeptides or portions thereof may be used to 

30 generate antibodies useful in determining the tissue 
type or species of origin of a biological sample. The an- 
tibodies may also be used to determine the cellular lo- 
calization of the secreted or non -secreted human 
polypeptides or the cellular localization of polypeptides 

35 which have been fused to the human polypeptides. In 
addition, the antibodies nnay also be used in immunoaf- 
finity chromatography techniques to isolate, purify, or 
enrrch the human polypeptide or a target polypeptide 
which has been fused to the human polypeptide. 

40 [0012] Public information on the number of human 
genes for which the promoters and upstream regulatory 
regions have been identified and characterized is quite 
limited. In part, this may be due to the difficulty of isolat- 
ing such regulatory sequences. Upstream regulatory 

45 sequences such as transcription factor binding sites are 
typically too short to be utilized as probes for isolating 
promoters from human genomic libraries. Recently, 
some approaches have been developed to isolate hu- 
man promoters. One of them consists of making a CpG 

so island library (Cross et ai , , Nature Genetics 6: 236-244, 
1994). The second consists of isolating human genomic 
DNA sequences containing Spel binding sites by the 
use of Spel binding protein. (Mortlock et ai, Geriome 
Res. 6:327-335, 1 996). Both of thes approaches have 

55 their limits due to a lack of specificity or because they 
are not univ rsalty applicable since only a limit d 
number of promoters have ither a CpG island or a Spe 
I recognition site and b cause Spe I hireling sites ar 
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not speciftcally found in promoter r gions. Thus, there 
exists a need to identify and systematicalty characterize 
the 5' portions of the genes. 

[0013] The present 5' ESTs may be used to efficiently 
identify and isolate 5'UTRs and upstream regulatory re- 
gions which control the location, developmental stage, 
rate, and quantity of protein synthesis, as well as the 
stability of the mRNA. Once identified and character- 
ized, these regulatory regions nr^y be utilized in gene 
therapy or protein purification schemes to obtain the de- 
sired amount and locations of protein synthesis or to in- 
hibit, reduce, or prevent the synthesis of undesirable 
gene products. 

[0014] In addition, ESTs containing the 5' ends of pro- 
tein genes may include sequences useful as probes for 
chromosome mapping and the identification of individ- 
uals. Thus, there is a need to identify and characterize 
the sequences upstream of the 5' coding sequences of 
genes. 

Summary of the Inventbn 

[0015] The present invention relates to purified, iso- 
lated, or enriched 5' ESTs which include sequences de- 
rived from the authentic 5' ends of their corresponding 
mRNAs. The term "corresponding mRNA' refers to the 
mRNA which was the template for the cDNA synthesis 
which produced the 5* EST. These sequences will be 
referred to hereinafter as '5' ESTs." The present inven- 
tion also includes purified, isolated or enriched nucleic 
acids comprising contigs assembled by determining a 
consensus sequences from a plurality of ESTs contain- 
ing overlapping sequences. These contigs will be re- 
ferred to herein as "consensus contigated ESTs." 
[001 6] As used herein, the term "purified" does not re- 
quire absolute purity; rather, it is intended as a relative 
definition. Individual 5* EST clones isolated from a cDNA 
library have been conventionally purified to electro- 
phoretic homogeneity. The sequences obtained from 
these clones could not be obtained directly either from 
the library or from total human DNA. The cDNA clones 
are not naturally occurring as such, but rather are ob- 
tained via manipulation of a partially purified naturally 
occurring substance (messenger RNA). The conversion 
of mRNA into a cDNA library involves the creation of a 
synthetic substance (cDNA) and pure individual cDNA 
clones can be isolated from the synthetic library by clon- 
al selection. Thus, creating a cDNA library from mes- 
senger RNA and subsequently isolating individual 
clones from that library results in an approximately 10*- 
10® fold purification of the native message. Purification 
of starting material or natural material to at least one 
order of magnitude, preferably two or three orders, and 
mor preferably four or five orders of magnitude is x- 
pressly contemplated. 

[0017] As used herein, the t rm "isolat d" requir s 
that the material be r moved from its original nviron- 
ment (e.g., the natural environment if it is naturally oc- 



curring). For exampi . a naturally -occurring polynucle- 
otide present in a living animal is not isolated, but the 
same polynucleotide, separated from some or all of the 
coexisting materials in the natural system, is isolated. 
5 [0018] As used herein, the term "enriched" means 
that the 5' EST is adjacent to "backbone" nucleic acid 
to which it is not adjacent in its natural environment. Ad- 
ditionally, to be "enriched" the 5' ESTs will represent 5% 
or more of the number of nuclec acid inserts in a pop- 
to ulation of nucleic acid backtx^ne molecules. Backbone 
molecules according to the present invention include 
nucleic acids such as expression vectors, self -replicat- 
ing nucleic acids, viruses, integrating nucleic acids, and 
other vectors or nucleic acids used to maintain or ma- 
ts nipulate a nucleic acid insert of interest. Preferably, the 
enriched 5* ESTs represent 15% or more of the number 
of nucleic acid inserts in the population of recombinant 
backbone molecules. More preferably, the enriched 5' 
ESTs represent 50% or more of the number of nucleic 
20 acid inserts in the population of recombinant backbone 
molecules. In a highly preferred embodiment, the en- 
riched 5' ESTs represent 90% or more of the number of 
nucleic acid inserts in the population of recornbinant 
backbone molecules. 
25 [0019] "Stringent", "moderate," and "low" hybridiza- 
tion conditions are as defined below. 
[0020] The term "polypeptide" refers to a polymer of 
amino acids without regard to the length of the polymer; 
thus, peptides, oligopeptides, and proteins are included 
30 within the definition of polypeptide. This temr) also does 
not specify or exclude post-expression modifications of 
polypeptides, for example, polypeptides which include 
the covalent attachment of glycosyl groups, acetyl 
groups, phosphate groups, lipid groups and the like are 
35 expressly encompassed by the term polypeptide. Also 
included within the definition are polypeptides which 
contain one or more analogs of an amino acid (including, 
for example, non-naturally occurring amino acids, ami- 
no acids which only occur naturally in an unrelated bio- 
40 logical system, modified amino acids from mammalian 
systems etc.): polypeptides with substituted linkages, as 
well as other modifications known in the art. both natu- 
rally occurring and non-naturally occurring. 
[0021] As used interchangeably herein, the terms 
45 "nucleic acids", "oligonucleotides', and "polynucle- 
otides" include RNA, DNA, or RNA/DNA hybrid se- 
quences of more than one nucleotide in either single 
chain or duplex form. The term "nucleotide" as used 
herein as an adjective to describe molecules comprising 
50 RNA, DNA, or RNA/DNA hybrid sequences of any 
length in single-stranded or duplex form. The term "nu- 
cleotide" is also used herein as a noun to refer to indi- 
vidual nucleotides or varieties of nucleotides, meaning 
a molecule, or individual unit in a larger nuci ic acid mol- 
55 ecule, comprising a purine or pyrimidine, a ribos or de- 
oxyribos sugar moiety, and a phosphate group, or 
phosphodiester linkage in the case of nucleotides within 
an oligonucleotide or polynucleotide. Although the term 



4 



7 



EP 1 033 401 A2 



8 



■nucleotide" is also used herein to ncompass "modified 
nucleotides" which comprise at least one modifications 
(a) an alternative linking group, (b) an analogous form 
of purine, (c) an analogous form of pyrimidine, or (d) an 
analogous sugar, for examples of analogous linking 
groups, purine, pyrimidines, and sugars see for example 
PCT publication No. WO 95/04064. The polynucleotide 
sequences of the Inventiori'^^may be prepared by any 
known method, including synthetic, recombinant, ex vi- 
vo generation, or a combination thereof, as well as uti- 
lizing any purification methods known in the art. 
[0022] The terms "base paired" and "Watson & Crick 
base paired" are used interchangeably herein to refer 
to nucleotides which can be hydrogen bonded to one 
another be virtue of their sequence identities in a man- 
ner like that found in double-hetlcal DNA with thymine 
or uracil residues linked to adenine residues by two hy- 
drogen bonds and cytosine and guanine residues linked 
by three hydrogen bonds. (See Stryer, L., Biochemistry, 

edition, 1995), 
[0023] The terms "complementary" or "complement 
thereof are used herein to refer to the sequences of 
polynucleotides which is capable of forming Watson & 
Crick base pairing with another specified polynucleotide 
throughout the entirety of the complementary region. 
For the purpose of the present invention, a first polynu- 
cleotide is deemed to be complementary to a second 
polynucleotide when each base in the first polynucle- 
otide is paired with its complementary base. Comple- 
mentary bases are, generally, A and T (or A and U), or 
C and G. "Complement" is used herein as a synonym 
from "complementary polynucleotide", "complementary 
nucleic acid" and "complerhentary nucleotide se- 
quence". These temns are applied to pairs of polynucle- 
otides based solely upon their sequences and not any 
particular set of conditions under which the two polynu- 
cleotides would actually bind. Preferably, a "comple- 
mentary" sequence is a sequence which an A at each 
position where there is a T on the opposite strand, a T 
at each position where there is an A on the opposite 
strand, a G at each position where there is a C on the 
opposite strand and a C at each position where there is 
a G on the opposite strand: 

[0024] Thus, 5' ESTs in cDNA libraries in which one 
or more 5* ESTs make up 5% or more of the number of 
nucleic acid inserts in the backbone molecules are "en- 
riched recombinant 5* ESTs" as defined herein. Like- 
wise, 5* ESTs in a population of plasmrds in which one 
or more 5' ESTs of the present invention have been in- 
serted such that they represent 5% or more of the 
number of inserts in the plasmid backbone are "enriched 
recombinant 5' ESTs" as defined herein. However, 5' 
ESTs in cDNA libraries in which 5* ESTs constitute less 
than 5% of the numb r of nucleic acid ins rts in the pop- 
ulation of backbon molecul s, such as libraries in 
which backbone molecules having a 5' EST insert are 
xtremely rare, are not "enrich d recombinant 5' ESTs." 
[0025] In some embodiments, th present invention 



relates to 5' ESTs which are d rived from genes ncod- 
ing secreted proteins. As used h rein, a "secreted' pro- 
tein is one which, when expressed in a suitable host cell, 
is transported across or through a membrane, including 

s transport as a result of signal peptides in its amino acid 
sequence. "Secreted" proteins include without limitation 
proteins secreted wholly (e.g. soluble proteins), or par- 
tially (e.g. receptors) from the cell in which they are ex- 
pressed. "Secreted" proteins also include without limi- 

fo tat ion proteins which are transported across the mem- 
brane of the endoplasmic reticulum. 
[0026] Such 5' ESTs include nucleic acid sequences, 
called signal sequences, which encode signal peptides 
which direct the extracellular secretion of the proteins 

IS encoded by the genes from which the 5' ESTs are de- 
rived. Generally, the signal peptides are located at the 
amino termini of secreted proteins. 
[0027] Secreted proteins are translated by ribosomes 
associated with the "rough" endoplasmic reticulum. 

20 Generally, secreted proteins are co-lranslalionally 
transferred to the membrane of the endoplasmic reticu- 
lum. Associatk>n of the ribosome with the endoplasmic 
reticulum during translation of secreted proteins is me- 
diated by the signal peptide. The signal peptide istypi- 

25 cally cleaved following its co-trans lationat entry into the 
endoplasmic reticulum. After delivery to the endoplas- 
mic reticulum, secreted proteins may proceed through 
the Golgi apparatus. In the Golgi apparatus, the proteins 
may undergo post-translational modification before en- 

30 tering secretory vesicles which transport them across 
the cell membrane. 

[0028] The 5' ESTs of the present invention have sev- 
eral important applications. For example, they may be 
used to obtain and express cDNA clones which include 

35 the full protein coding sequences of the corresponding 
gene products, including the authentic translation start 
sites derived from the 5' ends of the coding sequences 
of the mRNAs from which the 5' ESTs are derived. These 
cDNAs will be referred to hereinafter as lull-length cD- 

40 NAs." These cDNAs may also include DNA derived from 
mRNA sequences upstream of the translation start site. 
The full-length cDNA sequences nnay be used to ex- 
press the proteins corresponding to the 5' ESTs. As dis- 
cussed above, secreted proteins and non-secreted pro- 

45 teins may be therapeutically important. Thus, the pro- 
teins expressed from the cDNAs nnay be useful in treat- 
ing or controlling a variety of human conditions. The 5* 
ESTs may also be used to obtain the corresponding ge- 
nomic DNA. The term "corresponding genomic DNA" re- 

50 fers to the genomic DNA which encodes the mRNA from 
which the 5* EST was derived. 

[0029] Alternatively, the 5* ESTs may be used to ob- 
tain and express extended cDNAs encoding portions of 
th protein. In the case of s cr t d proteins, th portions 
55 may comprise the signal peptid s of th secret d pro- 
teins or the mature proteins g nerated when the signal 
p ptid is cleaved off. 

[0030] The pres nt invention includ s isolated, puri- 
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tied, or enriched "EST-relat d nucleic acids." Th terms 
■Isolated", ■purified" or ■ nriched" have the meanings 
provided above. As used herein, the term ■EST-related 
nucleic acids' means the nucleic acids of SEQ ID NOs: 
24-4100 and 8178-36681 . extended cDNAs obtainable 
using the nucleic acids of SEQ ID NOs: 24-4100 and 
8178-36681 , full-length cDNAs obtainable using the nu- 
cleic acids of SEQ ID NOs: 24-4100 and 81 78-36681 or 
genomic DNAs obtainable using the nucleic acids of 
SEQ I D NOs: 24-41 00 and 81 78-36681 . The present in- 
vention also includes the sequences complementary to 
the EST-related nucleic acids. 

[0031] The present invention also includes isolated, 
purified, or enriched "fragments of EST-related nucleic 
acids." The terms "isolated", "purified" and "enriched" 
have the meanings described above. As used herein the 
term fragments of EST-related nucleic acids" means 
fragments comprising at least 10, 12, 15, 18. 20, 23, 25, 
28. 30, 35. 40, 50, 75, 100= 200, 300, 500, or 1000 con- 
secutive nucleotides of the EST-related nucleic acids to 
the extent that fragments of these lengths are consistent 
with the lengths of the particular EST-related nucleic ac- 
ids being referred to. The present Invention also in- 
cludes the sequences complementary to the fragments 
of the EST-related nucleic acids. 

[0032] The present invention also includes isolated, 
purified, or enriched "positional segments of EST-relat- 
ed nucleic acids." The terms "isolated", "purified", or 
"enriched" have the meanings provided above. As used 
herein, the term "positional segments of EST-related nu- 
cleic acids" includes segments comprising nucleotides 
1-25. 26-50. 51-75. 76-100. 101-125. 126-150, 
151-175, 176-200, 201-225. 226-250, 251-300. 
301 -325, 326-350. 351 -375, 376-400, 401 -425, 
426-450, 451-475, 476-500, 501-525, 526-550, 
551-575, 576-600 and 601 -the terminal nucleotide of 
the EST-related nucleic acids to the extent that such nu- 
cleotide positions are consistent with the lengths of the 
particular EST-related nucleic acids being referred to. 
The term "positional segments of EST-related nucleic 
acids also includes segments comprising nucleotides 
1-50. 51-100, 101-150, 151-200, 201-250, 251-300, 
301-350, 351-400, 401-450. 450-500, 501-550, 
551-600 or 601 -the terminal nucleotide of the EST-re- 
lated nucleic acids to the extent that such nucleotide po- 
sitions are consistent with the lengths of the particular 
EST-related nucleic acids being referred to. The term 
"positional segments of EST-related nucleic acids" also 
includes segments comprising nucleotides 1-100. 
101-200, 201-300, 301-400, 501-500, 500-600, or 
601 -the terminal nucleotide of the EST-related nucleic 
acids to the extent that such nucleotide positions are 
consistent with the lengths of the particular EST-related 
nucleic acids being ref rred to. In addition, th t nm "po- 
sitional s gments of EST-related nucleic acids" includes 
s gments comprising nucleotid s 1-200. 201-400, 
400-600. or 601 -the terminal nucleotide of the EST-re- 
tated nucleic acids to the extent that such nucleotide po- 



sitions ar consistent with th lengths of the particular 
EST related nucleic acids being referred to. The present 
invention also includes the sequences complementary 
to the positional segments of EST-related nucleic acids. 

s [0033] The present invention also includes isolated, 
purified, or enriched fragments of positional segments 
of EST-related nucleic acids." The terms "isolated", "pu- 
rified", or "enriched" have the meanings provided above. 
As used herein, the term fragments of positional seg- 

^0 ments of EST-related nucleic acids" refers to fragments 
comprising at least 10. 15, 18, 20, 23, 25, 28, 30, 35. 
40, 50. 75, 100, 150, or 200 consecutive nucleotides of 
the positional segments of EST-related nucleic acids. 
The present invention also includes the sequences 

15 complementary to the fragments of positional segments 
of EST-related nucleic acids . 

[0034] The present invention also includes isolated or 
purified "EST-related polypeptides." The terms "isolat- 
ed" or "purified" have the meanings provided above, As 
20 used herein, the term "EST-related p>oly peptides' 
means the polypeptides encoded by the EST-related 
nucleic acids. Including the polypeptides of SEQ ID 
NOs: 4101-8177. 

[0035] The present invention also includes isolated or 
2S purified fragments of EST-related polypeptides." The 
terms "isolated" or "purified" have the meanings provid- 
ed above. As used herein, the term fragments of EST- 
related pK:>ly peptides" means fragments comprising at 
least 5. 10, 15. 20. 25, 30. 35. 40. 50. 75. 100, or 150 
30 consecutive amino acids of an EST-related polypeptide 
to the extent that fragments of these lengths are con- 
sistent with the lengths of the particular EST-related 
polypeptides being referred to. 

[0036] The present invention also includes isolated or 
35 purified "positional segments of EST-related polypep- 
tides." As used herein, the term "positional segments of 
EST-related polypeptides" includes polypeptides com- 
prising amino acid residues 1-25, 26-50, 51-75, 76-100, 
101-125, 126-150, 151-175, 176-200, or 201-the C-ter- 
40 minal amino acid of the EST-related polypeptides to the 
extent that such amino acid residues are consistent with 
the lengths of the particular EST-related polypeptides 
being referred to. The term "positional segments of EST- 
related polypeptides also Includes segments compris- 
es ing amino acid residues 1-50, 51 -100. 101-150, 151-200 
or 201 -the C-terminal amino acid of the EST-related 
polypeptides to the extent that such amino acid residues 
are consistent with the lengths of the particular EST-re- 
lated polypeptides being referred to. The term "position- 
so al segments of EST-related polypeptides" also includ s 
segments comprising amino acids 1-100 or 101-200 of 
the EST-related polypeptides to the extent that such 
amino acid residues are consistent with the lengths of 
particular EST-related polyp ptid s being r f erred to. In 
55 addition, the t rm "positional segments of EST-r lated 
polyp ptid s" includ s segments comprising amino ac- 
id residues 1-200 or 201-the C-terminal amino acid of 
the EST-related polypeptides to the extent that amino 
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acid residues are consistent with the I ngths of th par- 
ticular EST related polypeptides being referred to. 
[0037] The present invention also includes isolated or 
purified "fragments of positional segments of EST-relat- 
ed polypeptides." The temns "isolated" or "purified" have 
the meanings provided above. As used herein, the term 
"fragments of positional segments of EST-related 
jDolypeptides" means fragments comprising at least 5, 
10. 15, 20. 25. 30. 35, 40. 50. 75, 100. or 150 consecu- 
tive amino acids of positional segments of EST-related 
polypeptides to the extent that fragments of these 
lengths are consistent with the lengths of the particular 
EST-related polypeptides being referred to. 
[0038] The present invention also includes antibodies 
which specifically recognize the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides^ or fragments of 
positional segments of EST-related polypeptides. In the 
^case ot secreted proteins, such as those of SEQ ID NOs: 
7798-7888 antibodies which specifically recognize the 
mature protein generated when the signal peptide is 
cleaved may also be obtained as described below. Sim- 
ilarly, antibodies which specifically recognize the signal 
peptides of SEQ ID NOs: 4101-4729 or 7798-7888 may 
also bo obtained. 

[0039] In some embodiments and in the case of se- 
creted proteins, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids include a signal sequence. In 
other embodiments, the EST-related nucleic acids, frag- 
ments of EST-related nucleic acids, positional segments 
of EST-related nucleic acids, or fragments of positional 
segments of nucleic acids may include the full coding 
sequence for the protein or, in the case of secreted pro- 
teins, the full coding sequence of the mature protein (i. 
e. the protein generated when the signal polypeptide is 
cleaved off). In addition, the EST-related nucleic acids, 
fragments of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, or fragments of po- 
sitional segments of nucleic acids may include regula- 
tory regions upstream of the translation start site or 
downstream of the stop codon which control the 
amount, location, or devetopmental stage of gene ex- 
pression. 

[0040] As discussed above, both secreted and non- 
secreted human proteins may be therapeutically impor- 
tant. Thus, the proteins expressed from the EST-related 
nucleic acids, fragments of EST-related nucleic acids, 
positbnal segments of EST-related nucleic acids, or 
' fragments of positional segments of nucleic acids nriay 
be useful in treating or controlling a variety of human 
conditions. 

[0041] Th EST-related nuci ic acids, fragments of 
EST-r lated nucI ic acids, positional s gments of EST- 
related nucleic acids, or fragments of positional seg- 
m nts of nucleic acids may be us d in forensic proce- 
dures to identify individuals or in diagnostic procedures 



to identify individuals having genetic diseases resulting 
from abnormal gene expression. In addition, the EST- 
related nucleic acids, fragments of EST-related nucleic 
acids, positional segments of EST-related nucleic acids, 

5 or fragments of positional segments of nucleic acids ar 
useful for constructing a high resolution map of the hu- 
man chromosomes. 
. [0042] The present invention also relates to secretion 
vectors capable of directing the secretion of a protein of 

10 interest. Such vectors may be used in gene therapy 
strategies in which it is desired to produce a gene prod- 
uct in one cell which is to be delivered to another location 
in the body. Secretion vectors may also facilitate the pu- 
rification of desired proteins. 

IS [0043] The present invention also relates to expres- 
sion vectors capable of directing the expression of an 
inserted gene in a desired spatial or temporal manner 
or at a desired level. Such vectors may include sequenc- 
es upstream of the EST-related nucleic acids, fragments 

20 of EST-related nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 
segments of nucleic acids, such as promoters or up- 
stream regulatory sequences. 

[0044] The present invention also comprises fusion 

25 vectors for making chimeric polypeptides comprising a 
first polypeptide and a second polypeptide. Such vec- 
tors are useful for determining the cellular localization 
of the chimeric polypeptides or for isolating, purifying or 
enriching the chimeric polypeptides. 

30 [0045] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids, or fragments of positional seg- 
ments of nucleic acids may also be used for gene ther- 
apy to control or treat genetb diseases. In the case of 

35 secreted proteins, signal peptides may be fused to het- 
erologous proteins to direct their extracellular secretion. 
[0046] Bacterial clones containing Bluescipt plasmids 
having inserts containing the sequence of the non-clus- 
tered 5'ESTs are presently stored at 80** C In 4% (v/v) 

40 glycerol in the inventor's laboratories under the desig- 
nations. The non-clustered 5'ESTs are those which 
comprise a single EST from a single tissue in the listing 
of Table II. The inserts rmy be recovered from the stored 
materials by growing the appropriate clones on a suita- 

45 ble medium. The Bluescript DNA can then be isolated 
using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 
scale alkaline lysis plasmid isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 

50 trifugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography. 
The plasmid DNA obtained using these procedures may 
then be manipulated using standard cloning techniques 
familiar to thos skill d in th art. Alt mativ ly, a PGR 

55 can b done with primers d signed at both ends of th 
ins rted EST-r lated nucleic acids, fragments of EST- 
relat d nucleic acids, positional segments of EST-relat- 
ed nucleic acids, or fragments of positional segments of 
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nucleic acids. The PGR product which corresponds to 
the EST-related nucleic acids, fragments of EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids, or fragments of positional segments of nu- 
cleic acids can then be manipulated using standard 
cloning techniques familiar to those skilled in the art. 
[0047] One embodiment of the present invention is a 
purified nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs: 24-4100 and 
SEQID NOs: 8178-36681 and sequences complemen- 
tary to the sequences of SEQ ID NOs: 24-41 00 and SEQ 
ID NOs: 8178-36681. 

[0048] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 10 consec- 
utive nucleotides of a sequence seliscted from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0049] Another embodiment of the present invention 
is a purified nucleic acid comprising at least 15 consec- 
utive nucleotides of a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

[0050] A further embodiment of the present invention 
is a purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group consist- 
ing of 24-4100. 

[0051] Yet another embodiment of the present inven- 
tion is a purified nucleic acid comprising the full coding 
sequences of a sequence selected from the group con- 
sisting of SEQ ID NOs: 372V3811 wherein the full cod- 
ing sequence comprises the sequence encoding the 
signal peptide and the sequence encoding the mature 
protein. 

Still another embodiment of the present inventioii is a 
purified nucleic acid comprising a contiguous span of a 
sequence selected from the group consisting of SEQ ID 
NOs: 3721-3811 which encodes the mature protein. 
[0052] Another embodiment of the present invention 
is a purified nucleic acid comprising a contiguous span 
of a sequence selected from the group consisting of 
SEQID NOs: 24-652 and 3721-3811 which encodes the 
signal peptide. 

[0053] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of the sequences of SEQ ID NOs: 4101-8177. 
[0054] Another embodiment of the present invention 
is a purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consisting 
of th sequ nc s of SEQ ID NOs: 7798-7888. 
[0055] Another embodim nt of th pr s nt inv ntion 
is a purifi d nucleic acid encoding a polypeptide com- 
prising a mature protein included in a s qu nee selected 
from the group consisting of the sequ nces of SEQ ID 



NOs: 7798-7888. 

[0056] Another embodiment of*\he present inv ntion 
is a purified nucleic acid encoding a polypeptide com- 
prising a signal peptide included in a sequence selected 

5 from the group consisting of the sequences of SEQ ID 
NOs: 41 01 -4729 and 7798-7888. 
[0057] Another embodiment of the present invention 
is a purified nucleic acid at least 15,18, 20, 23, 26, 28, 
30. 35, 40, 50, 75, 100, 200. 300, 500 or 1000 nude- 

10 otides in length which hybridizes under stringent condi- 
tions to a sequence selected from the group consisting 
of SEQ ID NOs: 24-41 00 and SEQ ID NOs: 81 78-36681 
and sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681. 

IS [0058] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of the se- 
quences of SEQID NOs: 4101-8177. 
[0059] Another embodiment of the present invention 

20 is a purified or isolated polypeptide comprising a s - 
quence selected from the group consisting of SEQ ID 
NOs: 7798-7888. 

[0080] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a mature 

25 protein of a polypeptide selected from the group con- 
sisting of SEQ ID NOs: 7798-7888. 
[0061] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising a signal 
peptide of a sequence selected from the group consist- 

30 ing of the polypeptides of SEQ ID NOs: 41 01 -4729 and 
7798-7888. 

[0062] Another embodiment of the present invention 
is a purified or isolated polypeptide comprising at least 
10 consecutive amino acids of a sequence selected 
35 from the group consisting of the sequences of SEQ ID 
NOs: 4101-8177. 

[0063] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 
contacting a collection of mRN A molecules from human 

40 cells with a primer comprising at least 15 consecutive 
nucleotides of a sequence selected from the group con- 
sisting of the sequences complementary to SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681, hybridiz- 
ing said primer to an mRNA in said collection that en- 

45 codes said protein reverse transcribing said hybridiz d 
primer to make a first cDNA strand from said mRNA, 
making a second cDNA strand complementary to said 
first cDNA strand and isolating the resulting cDNA en- 
coding said protein comprising said first cDNA strand 

50 and said second cDNA strand. 

[0064] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0065] In on aspect of this mbodiment, the cDNA 
55 ncod s at I ast a portion of a human polypeptide. 
[0066] Another mbodim nt of th pres nt invention 
is a method of making a cDNA comprising the steps of 
obtaining a cDN A comprising a sequence selected from 
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the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681. contacting said cDNA with a de- 
tectable probe connprising at least 15 consecutive nu- 
cleotides of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and the sequences complementary to SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 under 
-conditions which permit said probe to hybridize to said 
cDNA, identifying a cDNA which hybridizes to said de- 
tectable probe, and isolating said cDNA which hybridiz- 
es to said probe. 

[0067] Another embodiment ot the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. 

[0068] In one aspect of this embodiment, the cDNA 
encodes at least a portion of a human polypeptide. 
[0069] Another embodiment of the present invention 
is a method of making a cDNA comprising the steps of 

'contacting a collection of mRNA molecules from human 
cells with a first primer capable of hybridizing to the 
potyA tail of said mRNA, hybridizing said first primer to 
said polyA tail, reverse transcribing said mRNA to make 
a first cDN A strand, making a second cDN A strand com- 
plementary to said first cDNA strand using at least one 
primer comprising at least 15 consecutive nucleotides 
of a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
and isolating the resulting cDNA comprising said first 
cDNA strand and said second cDNA strand. 
[0070] Another embodiment of the present invention 
is a purified cDNA obtainable by the method of the pre- 
ceding paragraph. . , . ; i. 
[0071] In one aspect of this embodiment, said cDNA 
encodes at least a portion of a human polypeptide. 
[0072] In another aspect of the preceding method the 
second cDNA strand is made by contacting said first cD- 
NA strand with a first pair of primers, said first pair of 
primers comprising a second primer comprising at least 

'1 5 consecutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681 and a third primer having a se- 
quence therein which is included within the sequence of 
said first primer, performing a first polymerase chain re- 
action with said first pair of primers to generate a first 
PGR product, contacting said first PGR product with a 
second pair of primers, said second pair of primers com- 
prising a fourth primer, said fourth primer comprising at 
least 1 5 consecutive nucleotides of said sequence se- 
lected from the group consisting of SEQ ID NOs: 
24-41 00 and SEQ I D NOs: 81 78-36681 , and a fifth prim- 
er, wherein said fourth and fifth hybridize to sequences 
within said first PGR product, and performing a second 
polymerase chain reaction, thereby generating a sec- 
ond PGR product, 

[0073] On aspect of this embodiment is a purified 
cDNA obtainable by the method of th preceding para- 
graph. 

[0074] In another aspect of this embodiment, said cD- 



NA encodes at least a portion of a human polypeptide. 
[0075] Alt rnativety, the second cDNA strand may be 
made by contacting said first cDN A strand with a second 
primer comprising at least 1 5 cons cutive nucleotides 

5 of a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
hybridizing said second primer to said first strand cDNA, 
and extending said hybridized second primer to gener- 
ate said second cDNA strand. 

10 [0076] One aspect of the above embodiment is a pu- 
rified cDNA obtainable by the method of the preceding 
paragraph. 

[0077] In a further aspect of this embodiment said cD- 
NA encodes at least a portion of a human polypeptide. 

IS [0078] Another embodiment of the present invention 
is a method of making a polypeptide comprising the 
steps ot obtaining a cDNA which encodes a polypeptide 
encoded by a nucleic acid comprising a sequence se- 
lected from the group consisting of SEQ ID NOs: 

20 24-4100 or a cDNA which encodes a polypeptide com- 
prising at least 10 consecutive amino acids of a polypep- 
tide encoded by a sequence selected from the group 
consisting of SEQ ID NOs: 24-4100, inserting said cD- 
NA in an expression vector such that said cDNA is op- 

25 erably linked to a promoter, introducing said expression 
vector into a host cell whereby said host cell produces 
the protein encoded by said cDNA, and isolating said 
protein. 

[0079] Another aspect of this embodiment is an iso- 
30 lated protein obtainable by the method of the preceding 
paragraph. 

[0080] Another embodiment of the present invention 
is a method of obtaining a promoter DNA comprising the 
steps of obtaining genomic DNA located upstream of a 

35 nucleic acid comprising a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681 and the sequences complementary 
to the sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681, screening said genomic DNA to 

40 identify a promoter capable of directing transcription in- 
itiation, and 

isolating said DNA comprising said identified promoter. 
[0081] In one aspect of this embodiment, said obtain- 
ing step comprises walking from genomic DNA compris- 
es ing a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and the sequences complementary to SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 . In another as- 
pect of this embodiment, said screening step comprises 
so inserting genomic DNA located upstream of a sequence 
selected from the group consisting of SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 and the se- 
quences complementary to SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 intoapromot r report rvec- 
ss tor. For example, said screening step may comprise 
identifying motifs in genomic DNA located upstream of 
a sequence selected from the group consisting of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 and the 
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sequences compi m ntary to SEQ ID NOs: 24-4100 
and SEQ ID NOs: 8178-36681 which are transcription 
factor binding sites or transcription start sites. 
[0082] Another embodiment of the present invention 
is a isolated promoter obtainable by the method of the 
paragraph above. 

Another embodiment of the present invention is the in- 
clusion of at least one sequence selected from the group 
consisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 . the sequences complementary to the se- 
quences of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and fragments comprising at least 15 con- 
secutive nucleotides of said sequence In an array of dis- 
crete ESTs or fragments thereof of at least 15 nucle- 
otides in length. In some aspects of this embodiment, 
the array includes at least two sequences selected from 
the group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 81 78-36681 , the sequences complementary to 
the sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681 , and fragments comprising at least 1 5 
consecutive nucleotides of said sequences. In another 
aspect of this embodiment., the array includes at least 
five sequences selected from the group consisting of 
SEQ ID. NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
the sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and fragments comprising at least 15 consecutive nu- 
cleotides of said sequences. 

[0083] Another embodiment of the present invention 
is an enriched population of recombinant nucleic acids, 
said recombinant nucleic acids comprising an insert nu- 
cleic acid and a backbone nucleic acid, wherein at least 
5% of said insert nucleic acids in said population com- 
prise a sequence selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 
and the sequences complementary to SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 . 
[0084] Another embodiment of the present invention 
is a purified or isolated antibody capable of specifically 
binding to a polypeptide comprising a sequence select- 
ed from the group consisting of SEQ ID NOs: 
4101-8177. 

A purified or Isolated antibody capable of specifically 
binding to a polypeptide comprising at least 10 consec- 
utive amino acids of a sequence selected from the group 
consisting of SEQ ID NOs: 4101-8177. 
An antibody composition capable of selectively binding 
to an epilope-cdntaining fragment of a polypeptide com- 
prising a contiguous span of at least 8 amino acids of 
any of SEQ ID NOs: 4101-8177, wherein said antibody 
is polyclonal or monoclonal. 

[0085] Another embodiment of the present invention 
is a computer readable medium having stored thereon 
a s qu nc sel ct d from th group consisting of a nu- 
cl ic acid cod of SEQ ID NOs: 24-4100 and 
8178-36681 and a polypeptide cod of SEQ ID NOs: 
4101-8177. 

[0086] Another embodiment of the present invention 



is a computer system comprising a processor and a data 
storage device wherein said data storage device has 
stored thereon a sequence selected from the group con- 
sisting of a nuci ic acid code of SEQID NOs: 24-4100 

s and 8178-36681 and a polypeptide code of SEQ ID 
NOs: 4101-8177. In one aspect of this embodiment the 
computer system further comprises a sequence com- 
parer and a data storage device having reference se- 
quences stored thereon. For example, the sequence 

10 comparer may comprise a computer program which in- 
dicates polymorphisms. 

In another aspect of this embodiment, the computer sys- 
tem further comprises an identifier which identifies fea- 
tures in said sequence. 

15 [COST] Another embodiment of the present invention 
is a method for comparing a first sequence to a refer- 
ence sequence wherein said first sequence is selected 
from the group consisting of a nucleic acid code of SE- 
QID NOs: 24-4100 and 8178-36681 and a polypeptide 

20 code of SEQ ID NOs: 4101-8177 comprising the steps 
of reading said first sequence and said reference se- 
quence through use of a computer program which com- 
pares sequences and determining differences between 
said first sequence and said reference sequence with 

25 said computer program. In some aspects of this embod- 
iment, said step of determining differences between th 
first sequence and the reference sequence comprises 
identifying polymorphisms. 

[0088] Another embodiment of the present invention 
30 is a method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid code 
of SEQID NOs: 24-4100 and 8178-36681 and a 
polypeptide code of SEQ ID NOs: 4101-8177 compris- 
ing the steps of reading said sequence through the use 
35 of a computer program which identifies features in se- 
quences and identifying features in said sequence with 
said computer program. 

[0089] Another embodiment of the present invention 
is a vector comprising a nucleic acid according to any 

40 one of the nucleic acids described above. 

[0090] Another embodiment of the present invention 
is a host cell containing the above vector. 
[0091] Another embodiment of the present invention 
is a method of making any of the nucleic acids described 

45 above comprising the steps of introducing said nucleic 
acid intoa host cell such that said nucleic acid is present 
in multiple copies in each host cell and isolating said 
nucleic acid from said host cell. 

[0092] Another embodiment of the present Invention 
50 is a method of making a nucleic acid of any of the nucleic 
acids described above comprising the step of sequen- 
tially linking together the nucleotides in said nucleic ac- 
ids. 

[0093] Anoth r embodiment of the pr s nt inv ntion 
55 is a method of making any of th polyp ptid s described 
abov wher in said polypeptides is 1 50 amino acids in 
length or less comprising the step of sequ ntiaify linking 
together th amino acids in said polypeptide. 
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[0094] Another embodiment of the present invention 
is a method of making any of the polypeptides described 
above wherein said polypeptides is 1 20 amino acids in 
length or less comprising the step of sequentially linking 
together the amino acids in said polypeptides. 

Brief Description of the Sequence Listing 

[0095] SEQ ID NOs: 1, 3, 5, 7, 9, 11, and 13 are full- 
length cDNAs prepared using the methods described 

- herein. 

[0096] SEQ ID NOs: 2, 4, 6, 8, 10. 1 2, and 14 are the 
polypeptides encoded by the nucleic acids of SEQ ID 
NOs: 1, 3. 5, 7. 9, 11. and 13. 

[0097] SEQ ID NOs: 15. 16. 18, 19, 21 and 22 are 
primers whose use is described in the specification, 
[0098] SEQ ID NOs: 17, 20, and 23 are the sequences 
of nucleic acids containing transcription factor binding 

- sites which were obtained as described below. 
[0099] SEQ ID NOs: 24-652 are nucleic acids having 
an incomplete ORF which encodes a signal peptide. As 
used herein, an "incomplete ORF" is an open reading 
frame in which a start codon has been identified but no 
stop codon has been identified. The locations of the in- 
complete ORFs and sequences encoding signal pep- 
tides are listed In the accompanying Sequence Listing. 
In addition, the von Heijne score of the signal peptide 
computed as described below is listed as the "score" in 
the accompanying Sequence Listing. The sequence of 
the signal-peptide is listed as 'seq" in the accompanying 
Sequence Listing. The "/■ in the signal peptide sequence 
indicates the location where proteolytic cleavage of the 

' signal peptide occurs to generate a mature protein. 
[0100] SEQ ID NOs: 653-3720 are nucleic acids hav- 
ing an incomplete ORF in which no sequence encoding 
a signal peptide has been identified to date. However, it 
remains possible that subsequent analysis will identify 
a sequence encoding a signal peptide in these nucleic 
acids. The locations of the incomplete ORFs are listed 
in the accompanying Sequence Listing. 
[0101] SEQ ID NOs: 3721-3811 are nucleic acids hav- 
ing a complete ORF which encodes a signal peptide. As 
used herein, a "complete ORF" is an open reading frame 
in which a start codon and a stop codon have been iden- 
tified. The locations of the complete ORFs and sequenc- 
es encoding signal peptides are listed in the accompa- 
nying Sequence Listing. In addition, the von Heijne 
score of the signal peptide computed as described be- 
low is listed as the "score" in the accompanying Se- 
quence Listing. The sequence of the signal-peptide is 
listed as "seq" in the accompanying Sequence Listing. 
The "/" in the signal peptide sequence indicates the lo- 
cation where proteolytic cleavage of the signal peptide 
occurs to general a mature protein. 
[0102] SEQ ID NOs: 3812-4100 are nucleic acids 
having a complete ORF in which no s quence encoding 
a signal peptide has been identifi d to date. However, it 
remains possible that subsequent analysis will identify 



a sequenc encoding a signal peptide in thes nucleic 
acids. The locations of the complete ORFs are listed in 
the accompanying Sequence Listing. 
[0103] SEQ ID NOs: 4101-4729 are "incomplete 

5 polypeptide sequences" which include a signal peptide. 
Incomplete polypeptide sequences'* are polypeptide se- 
quences encoded by nucleic acids in which a start co- 
don has been identified but no stop codon has been 
identified. These polypeptides are encoded by the nu- 

10 oleic acids of SEQ ID NOs: 24-652. The location of the 
signal peptide is listed in the accompanying Sequence 
Listing. In addition, the von Heijne score of the signal 
peptide computed as described below is listed as the 
"score" in the accompanying Sequence Listing. The se- 

is quence of the signal-peptide is listed as "seq" in the ac- 
companying Sequence Listing. The "/" in the signal pep- 
tide sequence indicates the location where proteolytic 
cleavage of the signal peptide occurs to generate a ma- 
ture protein. 

20 [0104] SEQ ID NOs: 4730-7797 are incomplete 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possibi 
that subsequent analysis will identify a signal peptide in 
these polypeptides. These polypeptides are encoded by 

25 the nucleic acids of SEQ ID NOs: 653-3720. 

[0105] SEQ ID NOs: 7798-7888 are "compi te 
polypeptide sequences" which include a signal peptide. 
"Complete polypeptide sequences" are polypeptide se- 
quences encoded by nucleic acids in which a start co- 

30 don and a stop codon have been identified. These 
polypeptides are encoded by the nucleic acids of SEQ 
- ID NOs: 3721-3811. The location of the signal peptide 
- - is listed in the accompanying Sequence Listing. In ad- 
dition, the von Heijne score of the signal peptide com- 

3S puted as described below is listed as the "score" in the 
accompanying Sequence Listing. The sequence of the 
signal-peptide is listed as "seq" in the accompanying 
Sequence Listing. The "/" in the signal peptide sequence 
indicates the location where proteolytic cleavage of the 

40 signal peptide occurs to generate a mature protein. 
[0106] SEQ ID NOs: 7889-8177 are complet 
polypeptide sequences in which no signal peptide has 
been identified to date. However, it remains possible 
that subsequent analysis will identify a signal peptide in 

4S these polypeptides. These polypeptides are encoded by 
the nucleic acids of SEQ ID NOs.:3812-4100. 
[0107] SEQ ID NOs: 8178-36681 are nucleic acid s - 
quences in which no open reading frame has been con- 
clusively identified to date. However, it remains possibi 

so subsequent analysis will identify an open reading frame 
in these nucleic acids. 

[0108] In the accompanying Sequence Listing, all in- 
stances of the symbol "n" in the nucleic acid sequences 
mean that th nucleotide can b adenine, guanine, cy- 
55 tosineorthymine. Insom instances the polypeptid se- 
quenc s in the Sequence Listing contain th symbol 
"Xaa."Thes "Xaa" symbols indicat either (1) a residue 
which cannot be identified because of nucleotide s - 
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quenc ambiguity or (2) a stop codon in th d t nmined 
sequenc where applicants believe one should not xist 
(if the sequence were determined nriore accurately). In 
some Instances, several possible identities of the un- 
known amino acids may be suggested by the genetic 
code. 

Brief Description of the Drawings 

[0109] Figure 1 summarizes the computer analysis 
procedure for obtaining consensus contigated ESTs. 
[01 1 0] Figure 2 is an analysis of the 43 amino terminal 
amino acids of all human SwissProt proteins to deter- 
mine the frequency of false positives and false nega- 
tives using the techniques for signal peptide identifica- 
tion described herein. 

[0111) Figure 3 illustrates methods for making extend- 
ed cDNAs. 

[0112] Figure 4 provides a schematic description of 
the promoters isolated and the way they are assembled 
with the corresponding 5' tags. 

[0113] Figure 5 describes the transcription factor 
binding sites present in each of these promoters. 

Detailed Description of the Preferred Embodiment 

i. General Methods for Obtaining 5' ESTs derived 
from mRNAs with intact 5' ends 

[0114] In order to obtain the 5' ESTs of the present 
invention, mRNAs with Intact 5' ends must be obtained. 
Example 1 betow describes the preparation of 5* ESTs. 

EXAMPLE 1 

Preparation of mRNA 

[0115] Total human RNAs or jx)lyA+ RNAs derived 
from 30 different tissues were respectively purchased 
from LABIMO and CLONTECH and used to generate 
42 cDNA libraries as described below. The purchased 
RNA had been isolated from cells or tissues using acid 
guanidium thiocyanate-phenol-chloroform extraction 
(Chomczyniski and Sacchi; Analytical Biochemistry 
162:156-159, 1987). PolyA* RNA was isolated from to- 
tal RNA (LABIMO) by two passes of oligo dT chroma- 
tography, as described by Aviv and Leder. , Proc. Natl. 
Acad, Sci, USA 69:1408-1412, 1972) in order to elimi- 
nate ribosomal RNA. 

[0116] The quality and the integrity of the pofyA* 
RNAs were checked. Northern blots hybridized with a 
globin probe were used to confirm that the mRNAs were 
not degraded. Contamination of the polyA^- mRNAs by 
ribosomal sequ nc s was checked using Northern blots 
and a probe derived from th sequenc of the 28S rR- 
N A. Pr parations of mRNAs with less than 5% of rRN As 
were used in library constmction. To avoid constructing 
llbrari s with RNAs contaminated by exogenous se- 



qu nces (prokaryotic or fun^l), th presenc of bacte- 
rial 16S ribosomal sequences or of two highly expressed 
fungal mRNAs was examined using PGR. 
[Oil 7] Following preparatbn of the mRNAs from var- 

s ious tissues an oligonucleotide tag was specifically at- 
tached to the caps at the 5' ends of the mRNAs. The 
oligonucleotide tag had an EcoRI site therein to facilitate 
later cloning procedures. Following attachment of the ol- 
igonucleotkJe tag to the mRNA, the integrity of the mR- 

10 NA was examined by performing a Northem blot with 
200 to 500 ng of mRNA using a probe complementary 
to the oligonucleotide tag before performing the first 
strand synthesis described in Example 2. 

IS EXAMPLE 2 

cDNA Synthesis Using mRNA Templates Having Intact 
5' Ends 

20 [0118] For the mRNAs joined to oligonucleotide lags, 
first strand cDNA synthesis was performed using a re- 
verse transcriptase with random nonamers as primers. 
In order to protect internal EcoRI sites in the cDNAfrom 
digestion at later steps in the procedure, methylated 

2S dCTP was used for first strand synthesis. After removal 
of RN Aby an alkaline hydrolysis, the first strand of cDNA 
was precipitated using isopropanol in order to ellminat 
residual primers. 

[0119] The second strand of the cDNA was synthe- 
30 sized with a Klenow fragment using a primer corre- 
sponding to the 5'end of the ligated oligonucleotide. 
Methylated dCTP was also used for second strand syn- 
thesis in order to protect intemal EcoRI sites in the cDN A 
from digestion during the cloning process. 
3S [0120] Following cDNA synthesis, the cDNAs were 
ctoned into pBlueScript as described in Example 3 be- 
low. 

EXAMPLE 3 

40 

Cloning of cDN As derivedf rom mRNA with intact 5' ends 
into BlueScript 

[0121] Following second strand synthesis, the ends 
45 of the cDNA were blunted with T4 DNA polymerase (Bi- 
olabs) and the cDNA was digested with EcoRI. Since 
methylated dCTP was used during cDN A synthesis, the 
EcoRI site present In the tag was the only hemi-melhyl- 
ated site, hence the only site susceptible to EcoRI di- 
so gestk>n. The cDNA was then size fractionated using x- 
clusion chromatography (AcA, Biosepra) and fractk>ns 
corresponding to cDNAs of more than 150 bp w r 
pooled and ethanol precipitated. The cDNA was direc- 
tionally cloned into the Smal and EcoRI ends of th 
55 phag midpBIueScriptv ctor (Stratagene). The ligation 
mixtur was lectroporat d into bacteria and propagat- 
ed under appropriate antibiotic s lection. 
[0122] Clones containing the oligonucleotide tag at- 
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tached were then selected as described in ExampI 4 
below. 

EXAMPLE 4 

Selection of Clones Having the Oligonucleotide Tag 
Attached Thereto 

[0123] The plasmid DNAs containing 5' EST libraries 
made as described above were purified (Qiagen). A 
• positive selection of the tagged clones was performed 
as follows. Briefly, in this selection procedure, the plas- 
mid DNA was converted to single stranded DN A using 
gene II endonuclease of the phage Fl in combination 
with an exonuclease (Chang et ai. Gene 127:95-8, 
1993) such as exonuclease ill or T7 gene 6 exonucle- 
ase. The resulting single stranded DNA was then puri- 
fied using paramagnetic beads as described by Fry et 
aL. Btotechniques, 13: 124-131, 1992. in this procedure, 
the single stranded DNA was hybridized with a bioti- 
nylated oligonucleotide having a sequence correspond- 
ing to the 3' end of the oligonucleotide tag. Clones in- 
cluding a sequence complementary to the brotinylated 
oligonucleotide were captured by incubation with 
streptavidin coated magnetic beads followed by mag- 
netic selection. After capture of the positive clones, the 
plasmid DNA was released from the magnetic beads 
and converted into double stranded DNA using a DfsIA 
polymerase such as the Thermosequenase obtained 
from Amersham Pharmacia Biotech. The double strand- 
ed DNA was then electroporated into bacteria. The per- 
centage of positive clones having the 5' tag oligonucle- 
otide was estimated to typically rank between 90 and 
98% using dot blot analysis. 

[0124] Following electroporation, the libraries were 
ordered in 384-microtiter plates (MTP). A copy of the 
MTP was stored for future needs. Then the libraries 
were transferred into 96 MTP and sequenced as de- 
scribed below. 

EXAMPLE 5 

Sequencing of Inserts in Selected Clones 

[0125] Plasmid inserts were first amplified by PGR on 
PE-9600 thermocycters (Perkin-Elmer, Applied Biosys- 
^ terns Division, Foster City, CA), using standard SETA- A 
and SETA-B primers (Genset SA), AmpltTaqGoid (Per- 
kin-Elmer), dNTPs (Boehringer), buffer and cycling con- 
ditions as recommended by the Perkin-Elmer Corpora- 
tion. 

[01 26] PGR products were then sequenced using au- 
tomatic ABl Prism 377 sequencers (Perkin Elmer). Se- 
qu ncing reactions were performed using PE 9600 ther- 
mocyclers with standard dy -prim r chemistry and 
ThermoSequenas (Amersham Pharmacia Biotech). 
Th prim rs used were either T7 or 21 Ml 3 (available 
from Genset SA) as appropriate. The primers wer la- 



beled with the JOE, FAM, ROX and TAMRA dyes. The 
dNTPs and ddNTPs used in the sequencing reactions 
were purchased from Boehringer Sequencing buffer, 
reagent concentrations and cycling conditions were as 

5 recommended by Amersham. 

[0127] Following the sequencing reaction, the sam- 
ples were precipitated with ethanol, resuspended in for- 
mamide loading buffer, and loaded on a standard 4% 
acryiamide gel. Electrophoresis was performed for 2.5 

10 hours at 3000V on an ABl 377 sequencer, and the se- 
quence data were collected and analyzed using the ABl 
Prism DNA Sequencing Analysis Software, version 
2.1-2. 

IS EXAMPLE 6 

Obtaining 5' ESTs from Full-length cDNA libraries 
Obtained from mRNA with Intact 5' Ends 

20 [01 28] Alternatively, 5'ESTs may be isolated from oth- 
er cDNA or genomic DNA libraries. Such cDNA or ge- 
nomic DNA libraries may be obtained from a commercial 
source or made using other techniques familiar tothos 
skilled in the art. One example of such cDNA library con- 

2S struction, a full-length cDNA library, is as follows. 

[0129] PolyA+ RNAs are prepared and their quality 
checked as described in Example 1, Then, the caps at 
the 5' ends of the polyA+ RNAs are specifically joined to 
an oligonucleotide tag. The oligonucleotide tag may 

30 contain a restriction site such as Eco Rl to facilitate fur- 
ther subcloning procedures., Northern blotting is then 
performed to check the size of mRNAs having the oli- 
gonucleotide tag attached thereto and to ensure that the 
mRNAs were actually tagged. 

35 [0130] First strand synthesis is subsequently carried 
out for mRNAs joined to the oligonucleotide tag as de- 
scribed in Example 2 above except that the randorri non- 
amers are replaced by an oligo-dT primer For instance, 
this oligo-dT primer may contain an internal tag of 4 nu- 

40 cleotides which is different from one tissue to the other 
Foltowing second strand synthesis using a prinner con- 
tained in the oligonucleotide tag attached to the 5' end 
of mRNA, the blunt ends of the obtained double strand- 
ed full-length DNAs are modified into cohesive ends to 

45 facilitate subcloning. For example, the extremities of 
full-length cDNAs may be modified to allow subcloning 
into the Eco RI and Hind III sites of a Bluescript vector 
using the Eco Rl site of the oligonucleotide tag and the 
addition of a Hind III adaptor to the 3' end of full-length 

50 cDNAs- 

[0131] The full-length cDNAs are then separated into 
several fractions according to their sizes using tech- 
niques familiar to those skilled in the art. For example^ 
elect rophoretic s paration may b applied in order to 
55 yield 3 or 6 differ nt fractions. Following gel xtraction 
and purification, th cDNA fractions are subcioned into 
appropriate vectors, such as Bluescript vectors, trans- 
formed into competent bacteria and propagated under 
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appropriate antibiotic conditions. Subs qu ntly. plas- 
mids containing tagged full-length cDNAs are positiv ly 
selected as described in Example 4. 
[0132] The 5' end of full-length cDNAs isolated fronn 
such cDNA libraries may then be sequenced as de- 
scribed in Example 5 

11.2. Computer Analysis of the Isolated 5' ESTs: 
construction of NetGene^** and SignalTag^ 
databases 

[01 33] The sequence data from the 42 cDNA libraries 
made as described above were transferred to a data- 
base, where quality control and validation steps were 
performed. A base-caller, working using a Unix system, 
automatically flagged suspect peaks, taking into ac- 
count the shape of the peaks, the inter-peak resolution, 
and the noise level. The proprietary base-caller also per- 
formed an automatic trimming. Any stretch of 25 or few- 
er bases having more than 4 suspect peaks was con- 
sidered unreliable and was discarded. Sequences cor- 
responding to cloning vector or ligation oligonucleotides 
were automatically removed from the EST sequences. 
However, the resulting EST sequences may contain 1 
to 5 bases belonging to the above mentioned sequenc- 
es at their 5* end. If needed, these can easily be re- 
moved on a case to case basis. 

[01 34] Following sequencing as described above, the 
sequences of the 5* ESTs were entered in NetGene™, 
a database for storage and nr^nipulation as described 
below and as depicted in Figure 1 . Before searching the 
ESTs in the NetGene™ database for sequences of in- 
terest, ESTs derived from mRNAs which were not of in- 
terest, such as endogenous or exogenous contami- 
nants, redundant sequences, small sequences, highly 
degenerate sequences, or repeated sequences were 
identified and eliminated from further consideration. 
[0135] In order to determine the accuracy of the se- 
quencing procedure as well as the efficiency of the 5' 
selection described atx5ve, the analyses described in 
Examples 7 and 8 respectively were performed on 
5'ESTs obtained from NetGene™* database following 
the elimination of sequences which were not of interest. 

EXAMPLE 7 

Measurement of Sequencing Accuracy by Comparison 
to Known Sequences 

[0136] To further determine the accuracy of the se- 
quencing procedure described in Example 5, the se- 
quences of NetGene™ 5' ESTs derived from known se- 
quences were identified and compared to the original 
known s quences. First, a FASTA analysis with ov r- 
hangs shorter than 5 bp on both ends was conduct d 
on th 5' ESTs to identify those matching an entry in th 
public human mRNA database. The 6655 5* ESTs which 
matched a known hunnan mRNA wer then realign d 



with their cognat mRNA and dynamic programming 
was used to include substitutions, ins rtions, and dele- 
tions in the list of 'errors" which would be recognized. 
Errors occurring in the last 10 bases of the 5' EST se- 
5 quences were ignored to avoid the inclusbn of spurious 
cloning sites in the analysis of sequencing accuracy. 
[0137] This analysis revealed that the sequences in- 
corporated in the NETGENE^ database had an accu- 
racy of more than 99.5%. 

10 

EXAMPLE 8 

Determination ot Efficiency of 5' EST Selection 

15 [01 38] To determine the efficiency at which the above 
selection procedures isolated. 5' ESTs which included 
sequences close to the 5' end of the mRNAs from which 
they derived, the sequences of the ends of the 5* ESTs 
derived from the elongation factor 1 subunit a and ferritin 

20 heavy chain genes were compared to the known cDNA 
sequences of these genes. Since the transcription start 
sites of both genes are well characterized, they may b 
used to determine the percentage of derived 5' ESTs 
which included the authentic transcription start sites. 

25 [01 39] For both genes, more than 95% of the obtained 
5' ESTs actually included sequences ctose to or up- 
stream of the 5' end of the corresponding mRNAs. 
[0140] To extend the analysis of the reliability of the 
procedures for isolating 5* ESTs from ESTs in the Net- 

30 Gene™ datal>ase, a similar analysis was conducted us- 
ing a database composed of human mRNA sequences 
- extracted from GenBank database release 97 for com- 

- - parison. The 5* ends of more than 85% of 5' ESTs de- 
rived from mRNAs included in the GeneBank database 

35 were located close to the 5' ends of the known se- 
quence. As some of the mRNA sequences available in 
the GenBank database are deduced from genomic se- 
quences, a 5' end matching with these sequences will 
be counted as an internal match. Thus, the method used 

40 here underestimates the yield of ESTs including the au- 
thentic 5' ends of their corresponding mRNAs. 

EXAMPLE 9 

45 Clustering of the 5' ESTs 

[0141] Since the cDNA libraries made above include 
multiple 5' ESTs derived from the same mRNA, overlap- 
ping 5'ESTs may be assembled into continuous se- 

50 quences. The following method (see Figure 1 ) describes 
how to efficiently cluster 5'ESTs in order to yield not only 
consensus 5EST sequences for mRNAs derived from 
different genes but also consensus 5'EST sequences 
fordiff r nt mRNAs, so call d variants, transcrib dfrom 

55 th sam gene such as alt matively spliced mRNAs. 
This clustering was perfomned on a set of N tG ne™ 
5'ESTs sequences following elimination of ndogenous 
contaminants, elimination of uninfonmative sequences 
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and masking of repeats. 

[0142] The whol set of sequences was first parti- 
tioned Into smaller sets, so-called clusters, containing 
sequences exhibiting perfect matches with each other 
on a given length. Such clusters contain 5'ESTs derived 
from a small number of different genes. Some 5'EST se- 
quences were not clustered using this approach either 
because they were not homologous to any other se- 
quence or because the homology was not properly de- 
tected. To overcome this problem, sequences not clus- 
tered, so called singletons, may be compared to the con- 
sensus contigated ESTs obtained later on and, if nec- 
essary, included in the appropriate clusters and used to 
compute other consensus contigated ESTs. 
[0143] Thereafter, all variants of a given gene were 
identified in each cluster as follows. Overlapping se- 
quences inside a given cluster were figured as oriented 
graphs where each sequence was a node and each 
overlap an edge. Then, the different genes contained 
within a single graph which were represented by differ- 
ent connex components were identified and isolated 
from each other Subsequently, the different variants of 
a same gene were isolated using an algorithm based on 
the detection of forks within a connex component. If de- 
sired, the consensus contigated EST sequences may 
be verified by identifying clones in nucleic acid samples 
derived from biological tissues, such as cDNA libraries, 
which hybridize to the probes based on the sequences 
of the consensus contigated ESTs and sequencing 
them. 

[0144] Overlapping 5'EST sequences belonging to 
' the-same variant as well as included 5'EST sequences 
" ' belonging to the same cluster were then contigated and 
consensus contigated 5'EST sequences were generat- 
ed for each variant. Some of the obtained consensus 
contigated 5'EST sequences were incomplete due to 
the fact that only included and overlapping 5'EST se- 
quences were considered to isolate genes and due to 
the algorithm developed to find variants. These variant 
consensus contigated 5'EST sequences were extended 
as follows. Variants transcribed from the same gene 
were compared pairwise and the 5' EST consensus se- 
quences that were incomplete either In 5' and/or in 3' 
were extended with the appropriate sequence from the 
other variants. All 5* EST consensus sequences even- 
tually completed in 5* or 3' from each cluster were sub- 
sequently compared to the whole set of individual 5'EST 
sequences obtained for this cluster. 

EXAMPLE 10 

■ 

Identification of the Most Probable Open Reading 
Frame of 5' ESTs 

[0145] Subsequ ntly, the nrost probable coding open 
reading f ram (ORF) may be det rmined for each con- 
sensus assembled 5'EST or 5'EST as follows. 
[01 46] Each nucleic acid sequence is first divided into 



several subsequences which coding propensity is eval- 
uated using different m thods known to those skilled in 
the art such as the evaluation of N-mer frequency and 
Its variants (Pickett and Tung, Nucleic Acids Res;20: 

5 6441-50 (1992)) or the Average Mutual Infornnation 
method (Grosse et al, International Conference on In- 
telligent Systems for Molecular Biology, Montreal, Can- 
ada. June 28-July 1 . 1 998). Each of the scores obtained 
by the techniques described above are then normalized 

10 by their distribution extremities and then fused using a 
neural network into a unique score that represents the 
coding probability of a given subsequence, 
[0147] The coding probability scores obtained for 
each subsequence, thus the probability score profiles 

IS obtained for each reading frame, are then linked to the 
initiation codons present on the sequence. For each 
open reading frame, defined as a nucleic acid sequence 
of at least 50 nucleotides beginning with an ATG codon, 
an ORF score is determined. Basically, this score is the 

20 sum of the probability scores cornputed for each subse- 
quence corresponding to the considered ORF in the cor- 
rect reading frame corrected by a function that negative- 
ly ponderates locally high score values and positiv ly 
ponderates sustained high score values. The chosen 

25 ORF is the one with the highest score. 

[0148] Two kinds of ORFs are considered. In som 
embodiments, 5'ESTs encoding ORFs of at least 50 
amino acids extending up to the end of the consensus 
assembled 5'EST sequences are obtained. In other em- 

30 bodiments, 5'ESTs encoding complete ORFs, namely 
ORFs with start and stop codons, containing at least 1 00 
amino ackis are obtained. 

EXAMPLE 11 

35 

Sequence Analysis 

[0149] Application of the clustering method described 
in Example 9 to a selected set of 126,735 NetGene™ 

40 5'ESTs free from endogenous contaminants and unin- 
formative sequences yielded 9490 consensus assem- 
bled 5'EST sequences or variants for a total of 8037 
genes clustered representing 98,973 individual 5'ESTs. 
One of them which contained 21,138 sequences and 

45 was shown to contain chimeras thanks to comparison 
to public sequences was removed from further analysis. 
[0150] Both non clustered 5'ESTs, i.e. singletons, and 
consensus contigated 5'ESTs were then compared to 
already known sequences as follows. Those sequences 

50 matching human mRNA sequences were eliminated 
from further analysis. Then, following masking of re- 
peats those sequences matching sequences that have 
already been discovered by the inventors, namely se- 
qu nces exhibiting more than 90% honnology over 

5S stretches long r than 40 nucleotides using BLAST2N 
with overhangs shorter than 10 nucleotides, were re- 
moved from further consideration. The final set repre- 
sents the sequences of th invention (SEQ ID NOs: 
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24-4100 and 8178-36681). /.a, 7609 consensus conth 
gated 5'EST from 6398 clusters containing 31.267 
5'ESTs and 24. 972 singletons. 

[01 51 ] Ot the 6398 obtained clusters, 658 were shown 
to be nr^ultivariant, Le. to contain several variants ot the 
same gene. Table I gives tor each ot the multivariant 
clusters named by its internal reference (first column), 
the list of the consensus sequerices of all variants, each 
variant being represented by a different SEQ ID NO. 
[0152] Subsequently, the most probable open reading 
frame was determined, as described in Example 10, for 
all sequences of the invention. 3,697 5'ESTs (SEQ ID 
NOs:24-3720) encoding incomplete ORFs (SEQ ID 
NOs:4101-7797) of at least 50 amino acid long were 
found. In addition, 380 5'ESTs (SEQ ID NOs: 372 1-4100) 
encoding complete ORFs (SEQ ID NOs:7798-B177) of 
at least 1 00 amirK) acids were found. 
[01 53] The nucleotide sequences of the SEQ ID NOs: 
24-41 00 and 81 78-36681 and the amino acid sequenc- 
es encoded by SEQ ID NOs: 24-4100 (i.e. amino acid 
sequences of SEQ ID NOs: 4101-8177) are provided in 
the appended sequence listing. Some of the amino acid 
sequences may contain "Xaa" designators. These "Xaa" 
designators indicate either (1) a residue which cannot 
be identified because ot nucleotide sequence ambiguity 
or (2) a stop codon in the determined sequence where 
applicants believe one should not exist (if the sequence 
were determined more accurately)- 
[01 54] If one of the nucleic acid sequences of SEQ ID 
NOs: 24-4100 and 8178-36681 are suspected of con- 
taining one or more incorrect or ambiguous nucleotides, 
the ambiguities can readily be resolved by resequencing 
a fragment containing the nucleotides to be evaluated. 
If one or more incorrect or ambiguous nucleotides are 
detected, the corrected sequences should be included 
in the clusters from which the sequences were isolated, 
and used to compute other consensus contigated se- 
quences on which other ORFs would be identified. Nu- 
cleic acid fragments for resolving sequencing errors or 
ambiguities may be obtained from deposited clones or 
can be isolated using the techniques described herein. 
Resolution of any such annblguities or errors may be fa- 
cilitated by using primers which hybridize to sequences 
located close to the ambiguous or erroneous sequenc- 
es. For example, the primers may hybridize to sequenc- 
es within 50-75 bases of the ambiguity or error Upon 
resolution of an error or ambiguity, the corresponding 
corrections can be made in the protein sequences en- 
coded by the ON A containing the error or ambiguity. The 
amino acid sequence of the protein encoded by a par- 
ticular clone can also be determined by expression of 
the clone in a suitable host cell, collecting the protein, 
and determining its sequence. 

[0155] In addition, if on of the s quenc s of SEQ ID 
NOs: 4101-8177 is suspected of containing an truncat- 
ed ORF as the result of a f rameshift in th sequ nee, 
such frameshifting errors may be corrected by combin- 
ing the following two approaches. The first one involves 



thorough examination of all doubl predictions, i.e. all 
cases wher the probability scores for two ORFs located 
on different reading frames are high and close, prefer- 
ably different by less than 0.4. The fine examination of 
5 the region where the two possible ORFs overiap nnay 
help to detect the frameshift. In the second approach 
homologies with known proteins are used to correct sus- 
pected frameshitts. 

10 EXAMPLE 12 

Identification of Potential Signal Sequences in 5' ESTs 

[0156] The amino acid sequences of SEQ ID NOs: 

IS 41 01 -81 77 were then searched to identify potential sig- 
nal motifs using slight modifications of the procedures 
disclosed in Von Heijne, Nucleic Acids Res. 14: 
4683-4690, 1 986. Those sequences encoding a 1 5 ami- 
rK) acid long stretch with a score of at least 3.5 in the 

20 Von Heijne signal peptide identification matrix were con- 
sidered to possess a signal sequence and were includ- 
ed in a database called SIGNALTAG™. 
[0157] The sequences of the 720 nucleic acid s - 
quences containing a signal sequence (SEQ ID NOs: 

2S 24-652 and 3721 -381 1 ) and the corresf>onding polypep- 
tides with a potential signal peptide (SEQ ID NO: 
4101-4729 and 7798-7888) are provided in the Se- 
quence Listing appended hereto. The signal peptides of 
such polypeptides are indicated as features in the ap- 

30 pended Sequence Listing. It should be noted that, in ac- 
cordance with the regulations governing Sequence List- 
ings, in the appended Sequence Listing, the full protein 
(i.e. the protein containing the signal peptide and the 
mature protein) extends from an amino acid residue 

55 having a negative number through a positively num- 
bered C-terminal amino acid residue. Thus, the first ami- 
no acid of the mature protein resulting from cleavage of 
the signal peptide is designated as amino acid number 
1 , and the first amino acid of the signal peptide is des- 

40 ignated with the appropriate negative number. 

[0158] To confirm the accuracy of the atwDve method 
for identifying signal sequences, the analysis of Exam- 
ple 1 3 was performed. 

45 EXAMPLE 13 

Confirmation ot Accuracy of Identification of Potential 
Signal Sequences in 5' ESTs 

so [01 59] The accuracy of the above procedure lor iden- 
tifying signal sequences encoding signal peptides was 
evaluated by applying the method to the 43 amino acids 
located at the N terminus of all hunrtan SwissProt pro- 
teins. The computed Von Heijn scor for each prot in 

ss was compared with th known charact rization of th 
protein as being a secreted protein or a non-sec r ted 
protein. In this manner, the number of non-secreted pro- 
teins having a score hjgher than 3.5 (false positives) and 
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the number of secreted proteins having a score lower 
than 3.5 (false negatives) could be calculated. 
[0160] Using the results of the above analysis, the 
probability that a peptide encoded by the 5' region of the 
mRNA is in fact a genuine signal peptide based on its 
Von Heijne's score was calculated based on either the 
assumption that 10% of human proteins are secreted or 
'the assumption that 20% of human proteins are secret- 
ed- The results of this analysis are shown in Figure 2. 
[0161] Using the above method of identification of se- 
cretory proteins, 5' ESTs of the following polypeptides 
known to be secreted were obtained: human glucagon, 
gamma interferon induced monokine precursor, secret- 
ed cyclophilin-like protein, human pleiotropin, and hu- 
man biotinidase precursor. Thus, the above method 
successfully identified those 5' ESTs which encode a 
signal peptide. 

[01 62] To confirm that the signal peptide encoded by 
the 5' ESTs or coniigated consensus 5' ESTs actually 
functions as a signal peptide, the signal sequences from 
the 5' ESTs or consensus 5* ESTs may be cloned into a 
vector designed for the identification of signal peptides. 
Such vectors are designed to confer the ability to grow 
in selective medium only to host cells containing a vector 
with an operably linked signal sequence. For example, 
to confirm that a 5' EST or consensus 5' EST encodes 
a genuine signal peptide, the signal sequence of the 5' 
EST or consensus 5* EST may be inserted upstream 
and in frame with a non-secreted form of the yeast in- 
vertase gene in signal peptide selection vectors such as 
those described in U.S. Patent No. 5,536,637. Growth 
of host cells containing signal sequence selection vec- 
tors with the correctly inserted 5' EST or consensus 5' 
EST signal sequence confirms that the 5' EST or con- 
sensus 5' ESTs encodes a genuine signal peptide. 
[0163] Alternatively, the presence of a signal peptide 
nnay be confintied by cloning the extended cDNAs ob- 
tained using the ESTs or consensus 5' ESTs into expres- 
sion vectors such as pXTI as described below, or by 
constructing promoter-signal sequence-reporter gene 
vectors which encode fusion proteins between the sig- 
nal peptide and an assayable reporter protein. After in- 
troduction of these vectors into a suitable host cell, such 
as COS cells or NIH 3T3 cells, the growth medium may 
be harvested and analyzed for the presence of the se- 
creted protein. The medium from these cells is com- 
pared to the medium from control cells containing vec- 
tors tacking the signal sequence or extended cDN A in- 
sert to identify vectors which encode a functional signal 
peptide or an authentic secreted protein. 

EXAMPLE 14 

Ass ssment of the novelty rat of 5' ESTs 

[0164] To ass ss the yield of new sequences, the ob- 
tained 5' ESTs and consensus contigated 5' ESTs were 
compared to all known human mRNAs xtracted from 



th EMBL releas 57 and daily updates available at th 
tim of filing. The comparison was performed using 
BLAST2N on both strands following masking of the re- 
peats. Sequences having more than 95% honrwiogy 
s with public sequences over their whole length with at 
most 1 0 nucleotide overhangs on each extremity were 
considered as previously identified. Thus, about 90% of 
5'ESTs or consensus assembled 5'ESTs were consid- 
ered unidentified. 

10 

II. 3. Evaluation of Spatial and Temporal Expression 
of mRNAs Corresponding to the 5'ESTs or Extended 
cDNAs 

IS [0165] Each of the SEQ ID NOs: 24-4100 and 
8178-36681 was also categorized based on the tissue 
from which its corresponding mRNA was obtained, as 
described below in Example 1 5, 

20 EXAMPLE 15 

Expression Patterns of mRNAs From Which the 5'ESTs 
were obtained 

2S [0166] Table M shows the spatial distribution of each 
of the 5'ESTs (non-clustered ESTs) and of each consen- 
sus contigated ESTs respectively. Table II provides the 
SEQ ID NOs: of the 5' ESTs (referred to alternatively 
herein as non-clustered ESTs or singletons) and con- 

30 sensus contigated ESTs. Table II also lists the number 
of ESTs from each type of tissue which were used to 
assemble the contigated consensus ESTs. The SEQ ID 
NOs: in Table II which contain a single 5' EST from a 
single tissue are 5' ESTs. Each type of tissue listed in 

35 Table It is encoded by a letter. The correspondence be- 
. tween the letter code and the tissue type is given in Table 

III. For example, the consensus contigated EST of SEQ 
ID NO: 47 contains one 5'EST from cancerous prostate, 
two 5'ESTs from lymph ganglia, and two 5'ESTs from 

40 testes. 

[0167] In addition to categorizing the 5' ESTs and con- 
sensus contigated 5' ESTs with respect to their tissue of 
origin, the spatial and temporal expression patterns of 
the mRNAs corresponding to the 5' ESTs and consen- 
ts sus contigated 5* ESTs, as well as their expression lev- 
els, may be determined as described in Example 16 be- 
low. 

[0168] Characterization of the spatial and temporal 
expression pattems and expression levels of these mR- 
so NAs is useful for constructing expression vectors capa- 
ble of producing a desired level of gene product in a de- 
sired spatial or temporal manner, as will be discussed 
in more detail below. 

[0169] Furthermor , 5* ESTs and cons nsus contigat- 
5S ed 5' ESTs whos corresponding mRNAs are associat- 
ed with disease states may also b identified. For ex- 
ample, a particular disease may result from th lack of 
expr ssion. over expression, or und r expr ssion of a 
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mRNA corresponding to a 5* EST or consensus conti- 
gated 5' EST By connparing mRNA expr ssion patterns 
and quantities in samples taken from healthy individuals 
with those from individuals sutfering from a particular 
disease^ 5* ESTs or consensus contigated 5* ESTs re- 
sponsible for the disease may be identified. 
[0170] It will be appreciated that the results of the 
above characterization procedures for 5' ESTs and con- 
sensus contigated 5' ESTs also apply to extended cD- 
NAs (obtainable as described below) which contain se- 
quences adjacent to the 5' ESTs and consensus conti- 
gated 5' ESTs. It will also be appreciated that if desired, 
characterization may be delayed until extended cDNAs 
have been obtained rather than characterizing the 5* 
ESTs or consensus contigated 5' ESTs themselves. 

EXAMPLE 16 

Evaluation of Expression Levels and Pattems of 
mRNAs Corresponding to EST-Related Nucleic Acids 

[0171] Expression levels and paittems of mRNAs cor- 
responding to EST-related nucleic acids may be ana- 
lyzed by solution hybridization with long probes as de- 
scribed in International Patent Application No. WO 
97/05277. Briefly, an EST-related nucleic acid, fragment 
of an EST related nucleic acid, positional segment of an 
EST-related nucleic acid, or fragment of a positional 
segment of an EST-related nucleic acid corresponding 
to the gene encoding the mRNA to be characterized is 
inserted at a cloning site immediately downstream of a 
bacteriophage (T3. T7 or SP6) RNA polymerase pro- 
moter to produce antisense RNA. Preferably, the EST- 
related nucleic acid, fragment of an EST related nucleic 
acid, positional segment of an EST-related nucleic acid, 
or fragment of a positional segment of an EST-related 
nucleic acid is 100 or more nucleotides in length. The 
plasmid is linearized and transcrtoed in the presence of 
ribonucleotides comprising modified ribonucleotides (t. 
e. biotin-UTP and DIG-UTP). An excess of this doubly 
labeled RNA is hybridized in solution with mRNA isolat- 
ed from cells or tissues of interest. The hybridizations 
are perfomied under standard stringent conditions 
{40-50'*C for 16 hours in an80%fonnnamide, 0.4 M NaCI 
buffer, pH 7-8). The unhybridized probe is removed by 
digestion with ribonucleases specific for single-stranded 
RNA (i.e. RNases CL3. T1, Phy M, U2 or A). The pres- 
ence of the biolin-UTP nrKxiification enables capture of 
the hybrid on a microtitration plate coated with strepta- 
vidin. The presence of the DIG modification enables the 
hybrid to be detected and quantified by ELISA using an 
anti-DIG antibody coupled to alkaline phosphatase. 
[0172] The EST-related nucleic acid, fragment of an 
EST relat d nucleic acid, positional segment of an EST- 
relat d nucleic acid, or fragment of a positional segment 
of an EST-related nucleic acid may also be tagged with 
nucleotide sequences for the s rial analysis of gene ex- 
pr ssion (SAGE) as disclosed in UK Patent Application 



No. 2 305 241 A In this method, cDNAs ar prepared 
from a cell, tissue, organism or other source of nucleic 
acid for which gene expression pattems must be deter- 
mined. The resulting cDNAs are separated into two 
5 pools. The cDNAs in each pool are cleaved with a first 
restriction endonuclease, called an anchoring enzyme, 
having a recognition site which is likely to be present at 
least once in most cDNAs. The fragments which contain 
the 5* or 3* most region of the cleaved cDNA are isolated 
10 by binding to a capture medium such as streptavidin 
coated beads. A first oligonucleotide linker having a first 
sequence for hybridization of an amplification primer 
and an internal restriction site for a so called tagging 
endonuclease is ligated to the digested cDNAs in the 
IS first pool. Digestbn with the second endonuclease pro- 
duces short tag fragments from the cDN As. 
[0173] A second oligonucleotide having a second se- 
quence for hybridization of an amplification primer and 
an intemal restriction site is ligated to the digested cD- 
20 N As in the second pool. The cDN A fragments in the s c- 
ond pool are also digested with the tagging endonuci - 
ase to generate short tag fragments derived from the 
cDNAs in the second pool. The tags resulting from di- 
gestion of the first and second pools with the anchoring 
25 enzyme and the tagging endonuclease are ligated to 
one another to produce so called ditags. In some m- 
bodiments, the ditags are concatamerized to produce 
ligation products containing from 2 to 200 ditags. Th 
tag sequences are then determined and compared to 
30 the sequences of the EST-related nucleic acid, fragment 
of an EST related nucleic acid, positional segment of an 
EST-related nucleic acid, or fragment of a positional 
segment of an EST-related nucleic acid to determine 
which 5' ESTs, contigated consensus 5* ESTs. or ex- 
35 tended cDNAs are expressed in the cell, tissue, organ- 
ism, or other source of nucleic acids from which the tags 
were derived. In this way, the expression pattern of the 
5* ESTs, contigated consensus 5* ESTs, or extended cD- 
NAs in the cell, tissue, organism, or other source of nu- 
40 cteic acids is obtained. 

[0174] Quantitative analysis of gene expression may 
also be performed using arrays. As used herein, the 
term array means a one dimensional, two dimensional, 
or multidimensional arrangement of EST-related nucleic 
45 acids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably, the EST-related nucleic acids, fragments of EST 
related nucleic acids, positional segments EST-related 
50 nucleic acids, or fragments of positional segments of 
EST-related nucleic acids are at least 15 nucleotid s in 
length. More preferably, the EST-related nucleic acids, 
fragments of EST related nucleic acids, positional seg- 
m nts EST-rolat d nucleic acids, or fragments of posi- 
55 tionals gments of EST-related nucl ic acids ar at I ast 
100 nucleotide tong. More preferably, th fragments are 
mor than 100 nucleotides in length. In som mbodi- 
ments, the EST-related nucleic acids, fragments of EST 
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related nucleic acids, positional segments EST-related 
nucleic acids, or fragments of positional segments of 
EST-related nucleic acids may be more than 500 nucle- 
otides long. 

[0175] For example, quantitative analysis of gene ex- 
pression may be performed with EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids in a com- 
plementary DNA microarray as described by Schena et 
a/. (Sc/ence 270:467-470, 1995; Proa Nati Acad, Sci. 
U SA. 93:10614-10619, 1996). EST-related nucleic ac- 
ids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, or fragments of po- 
sitional segments of EST-related nucleic acids are am- 
plified by PGR and arrayed from 96-well microliter plates 
onto silylated microscope slides using high-speed ro- 
botics. Printed arrays are incubated in a humid chamber 
to allow rehydration of the array elements and rinsed, 
once in 0.2% SDS for 1 min, twice in water for 1 min and 
once for 5 min in sodium borohydride solution. The ar- 
rays are submerged in water for 2 min at 95*C, trans- 
ferred into 0.2% SDS for 1 min, rinsed twice with water, 
air dried and stored in the dark at 25°C. 
[01 76] Cell or tissue mRN A is Isolated or commercial- 
ly obtained and probes are prepared by a single round 
of reverse transcription. Probes are hybridized to 1 cm^ 
mic roar rays under a 1 4 x 1 4 mm glass coverslip for 6-1 2 
hours at 60° C Arrays are washed for 5 min at 25** C in 
low stringency wash buffer (1 x SSC/0.2% SDS), then 
for 10 min at room temperature in high stringency wash 
buffer (0.1 x SSC/0.2% SDS) . Arrays are scanned in 0.1 
X SSC using a fluorescence laser scanning device fitted 
with a custom filter set. Accurate differential expression 
measurements are obtained by taking the average of 
the ratios of two independent hybridizations. 
[0177] Quantitative analysis of the expression of 
genes may also be performed with EST-related nucleic 
acids, fragments of EST related nucleic acids, positional 
segments EST-related nucleic acids, orfragments of po- 
sitional segments of EST-related nucleic acids in com- 
plementary DNA arrays as described by Pietu etal (Ge- 
nome Research 1996). The EST-related nu- 
cleic acids, fragments of EST related nucleic acids, po- 
sitional segments EST-related nuciek: acids, or frag- 
ments of positional segments of EST-related nucleic ac- 
ids thereof are PGR amplified and spotted on mem- 
branes. Then, mRNAs originating from various tissues 
or cells are labeled with radioactive nucleotides. After 
hybridization and washing in controlled conditions, the 
hybridized mRNAs are detected by phospho-imaging or 
autoradiography. Duplicate experiments are performed 
and a quantitative analysis of differentially expressed 
mRNAs is th n perform d. 

[0178] Altemativ ly, expr ssion analysis of the EST- 
related nucleic acids, fragments of EST related nuci ic 
acids, positional segments EST-related nucleb acids, or 
fragments of positional segments of EST-related nucleic 



acids can be done through high^ensity nucleotide ar- 
rays as described by Lockhart et ai (Nature Biotechr^ol- 
opy 14: 1 675- 16B0, 1996) and Sosnowsky etai (Proa. 
Natl. Acad ScL 94:1119-1123, 1997). Oligonucleotides 

5 of 15-50 nucleotides corresponding to sequences of 
EST-related nucleic acids, fragrrients of EST related nu- 
cleic acids, positional segments EST-related nucleic ac- 
ids, or fragments of positional segments of EST-related 
nucleic acids are synthesized directly on the chip (Lock- 

10 hart et ai, supra) or synthesized and then addressed to 
the chip (Sosnowsky et ai, supra). Preferably, the oli- 
gonucleotides are about 20 nucleotides In length. 
[01 79] cDNA probes labeled with an appropriate com- 
pound, such as biotin, digoxigenin or fluorescent dye, 

75 are synthesized from the appropriate mRNA population 
and then randomly fragmented to an average size of 50 
to 100 nucleotides. The said probes are then hybridized 
to the chip. After washing as described in Lockhart eta/, 
supra and application of different electric fields 

20 (Sonowsky et a/, supra ), the dyes or labeling com- 
pounds are detected and quantified. Duplicate hybridi- 
zations are perfornr^d. Comparative analysis of the in- 
tensity of the signal originating from cDNA probes on 
the same target oligonucleotide in different cDNA sam- 

2S pies indicates a differential expression of the mRNA cor- 
responding to the 5' EST, consensus contigated 5' EST 
or extended cDNA from which the oligonucleotide se- 
quence has been designed. 

30 III. Use of 5' ESTs to Clone Extended cDNAs and to 
Clone the Corresponding Genomic DNAs 

[01 80] Once 5' ESTs or consensus contigated 5' ESTs 
which include the 5' end of the corresponding mRNAs 

3S have been selected using the procedures described 
above, they can be utilized to isolate extended cDNAs 
which contain sequences adjacent to the 5' ESTs or con- 
tigated consensus 5' ESTs. The extended cDNAs may 
include the entire coding sequence of the protein encod- 

40 ed by the conrespondlng mRNA, including the authentic 
translation start site. If the extended cDNA encodes a 
secreted protein, it may contain the signal sequence, 
and the sequence encoding the mature protein remain- 
ing after cleavage of the signal peptide. Extended cD- 

45 MAS which include the entire coding sequence of the 
protein encoded by the corresponding mRNA are re- 
ferred to herein as lull-length cDN As." Alternatively, the 
extended cDNAs may not Include the entire coding s - 
quence of the protein encoded by the corresponding 

so mRNA, although they do include sequences adjacent to 
the 5'ESTs or contigated consensus 5' ESTs. In some 
embodiments in which the extended cDN As are derived 
from an mRNA encoding a secreted protein, the extend- 
ed cDNAs may include only the s qu nee encoding th 

S5 matur prot in r nnaining after cleavage of the signal 
peptide, or only the sequ nee ncoding the signal p p- 
tide. 

[0181] Example 17 below describes a general method 
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for obtain 'r.g extended cDNAs using 5* ESTs or consen- 
sus coniigated 5' ESTs. Example 28 below describes 
the cbning and sequencing of several extended cDNAs, 
including extended cDNAs which include the entire cod- 
ing sequence and authentic 5' end of the corresponding 
mRNA for several secreted proteins. 
[01 82] The methods of Examples 1 7 and 1 8 can also 
be used to obtain extended cDNAs which encode less 
than the entire coding sequence of proteins encoded by 
the genes corresponding to the 5' ESTs or consensus 
contigated ESTs. In some embodiments, the extended 
cDNAs isolated using these methods encode at least 
5,10, 15, 20. 25. 30. 35, 40. 50, 75, 100, or 150 consec- 
utive amino acids of one of the proteins encoded by the 
sequences of SEQ ID NOs: 24-4100 and 8178-36681. 
In some embodiments, the extended cDNAs isolated 
using these methods encode at least 5, 10, 15, 20. 25, 
30. 35, 40, 50, 75, 100, or 150 consecutive amino acids 
of one of the proteins encoded by the sequences of SEQ 
ID NOs: 24-4100. 

EXAMPLE 17 

General Method for Using 5' ESTs to Clone and 
Sequence Extended cDNAs which Include the Entire 
Coding Region and the Authentic 5'End of the 
Corresponding mRNA 

[0183] The following general method has been used 
to quickly and efficiently isolate extended cDNAs includ- 
ing sequence adjacent to the sequences of the 5' ESTs 
used to obtain them. This method may be applied to ob- 
tain extended cDNAs for any 5' EST or consensus con- 
tigated 5' EST of the invention, including those 5' ESTs 
and consensus contigated 5* ESTs encoding secreted 
proteins. This method is summarized in Figure 3. 

1. Obtaining Extended cDNAs 

a) First strand synthesis 

[0184] The method takes advantage of the known 5* 
sequence of the mRNA. A reverse transcription reaction 
is conducted on purified mRNA with a poly dT primer 
containing a nucleotide sequence at its 5* end allowing 
the addition of a known sequence at the end of the cDNA 
which corresponds to the 3" end of the mRNA. Such a 
primer and a commercially-available reverse tran- 
scriptase enzyme are added to a buffered mRNA sam- 
ple yielding a reverse transcript anchored at the 3* polyA 
site of the RNAs. Nucleotide monomers are then added 
to complete the first strand synthesis. 
[0185] After removal of the mRNA hybridized to the 
first cDNA strand by alkalin hydrolysis, th products of 
the alkaline hydrolysis and th residual poly dT primer 
can be liminated with an exclusion column. 



b) Second strand synthesis 

[0186] A pair of nested primers on each end is de- 
signed based on the known 5' sequence from the 5* EST 

5 or contigated consensus 5' EST and the known 3' end 
added by the poly dT primer used in the first strand syn- 
thesis. Softvtfare used to design primers are either based 
on GC content and melting temperatures of oligonucle- 
otides, such as OSP (lllier and Green. PCR Meth. AppL 

10 1:124-128, 1991), or based on the octamer frequency 
disparity method (Griffais etaL, Nucleic Acids Res. 19: 
3887-3891, 1991 such as PC-Rare (http://bioinformat- 
ics.weizmann.ac.il/software/PC-Rare/doc/manueL 
html). 

IS [01 87] Preferably, the nested primers at the 5' end and 
the nested primers at the 3' end are separated from one 
another by four to nine bases. These primer sequences 
may be selected to have melting temperatures and spe- 
cificities suitable for use in PCR. 

20 [0188] A first PCR run is performed using the out r 
primer from each of the nested pairs. A second PCR run 
is periormed using the same enzyme and -.^ c. ^^ner prim- 
er from each of the nested pairs is then periormed on a 
small sample of the first PCR product. Thereafter, th 

2S primers and remaining nucleotide monomers are re- 
moved. 

2, Sequencing of Full Length Extended cDNAs or 
Fragments Thereof 

30 

[0189] Due to the lack of position constraints on the 
design of 5' nested primers compatible for PCR use us- 
ing the OSP software, amplicons of two types are ob- 
tained; Preferably, the second 5' primer is located up- 

3S stream of the translation initiation codon tnus yielding a 
nested PCR product containing the entire coding se- 
quence. Such a full length extended cDNA may be us d 
in a direct cloning procedure. However, in some cases, 
the second 5* primer is located downstream of the trans- 

40 lation initiation codon, thereby yielding a PCR product 
containing only part of the ORR Such incomplete PCR 
products are submitted to a modified procedure de- 
scribed in section b below. 

4S a) Nested PCR products containing complete ORFs 

[0190] When the resulting nested PCR product con- 
tains the complete coding sequence, as predicted from 
the 5'EST or consensus contigated 5' EST sequence, it 
so is cloned in an appropriate vector. 

b) Nested PCR products containing incomplete ORFs 

[01 91] When th amplicon does not contain th com- 
55 plete coding s quence, intermedial steps are neces- 
sary to obtain both th complete coding sequ nee and 
a PCR product containing the full coding sequence. The 
complel coding sequence can b assembled from s v- 
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ral partial sequences determined directly from different 
PGR products. 

[0192] Once the full coding sequence has been com- 
pletely determined, new primers compatible for PGR 
use are then designed to obtain ampi icons containing 
the whole coding region. However, in such cases, 3' 
primers compatible for PGR use are located inside the 
UTR of the corresponding mRNA, thus yielding am- 
plicons which lack part of this region, i.e. the polyA tract 
'and sometimes the polyadenylatlon signal, as illustrated 
in Figure 3. Such full length extended cDNAs are then 
cloned into an appropriate vector. 

c) Sequencing extended cDNAs 

[0193] Sequencing ot extended cDNAs can be per- 
formed using a Die Terminator approach with the Ampl- 
iTaq DNA polymerase FS kit available from Perkin Elm- 

^er 

[0194] In order to sequence PGR fragments, primer 
walking is performed using software such as OSP to 
choose primers and automated computer software such 
as ASMG (Sutton ef a/.. Genome Science Technoi 1: 
9-1 9, 1 995) to construct contigs of walking sequences 
including the initial 5' tag using minimum overlaps of 32 
nucleotides. Preferably, primer walking is performed un- 
til the sequences of full length cDNAs are obtained. 

3. Cloning of Full Length Extended cDNAs 

[0195] The PGR product containing the full coding se- 
quence is then cloned in an appropriate vector For ex- 
ample, the extended cDNAs can be cloned into any ex- 
pression vector known in the art. 
[01 96] Since the PGR products obtained as described 
above are blunt ended molecules that can be cloned in 
either directiori, the orientation of several clones for 
each PGR product is determined. Then. 4 to 10 clones 
are ordered in microliter plates and subjected to a PGR 
reaction using a first primer located in the vector close 
to the cloning site and a second primer located in the 
portion of the extended cDNA corresponding to the 3' 
end of the mRNA. This second primer may be the anti- 
sense primer used in anchored PGR in the case of direct 
cloning (case a) or the antisense primer located inside 
the 3*UTR in the case of indirect cloning (case b). Clones 
in which the start codon of the extended cDNA is oper- 
ably linked to the promoter in the vector so as to permit 
expression of the protein encoded by the extended cD- 
NA are conserved and sequenced. In addition to the 
ends of cDNA inserts, approximately 50 bp of vector 
DNA on each side of the cDNA insert are also se- 
quenced. 

[0197] Cloned PGR products ar then entirely se- 
qu nc d in order to obtain at least two sequences per 
clone. Pr ferably, the sequences are obtained from both 
sens and antisense strands according to the afore- 
mentioned procedur with the following modifications. 



First, both 5' and 3' ends of cloned PGR products are 
sequenced in ord r to confirm the identity ot the clone. 
Second, primer walking is performed if the full coding 
coding regran has not been obtained yet. Gontigatbn is 

s then performed using primer walking sequences for 
cloned products as well as walking sequences that have 
already contigated lor uncloned PGR products. The se- 
quencers considered complete when the resulting con- 
tigs include the whole coding region as well as overlap- 

^0 ping sequences with vector DNA on both ends. All the 
contigated sequences for each cloned amp I icon are 
then used to obtain a consensus sequence. 

4. Selection of cloned full length sequences obtained 
^5 from the 5' ESTs of the present invention 

[0198] A negative selection may be performed in or- 
der to eliminate unwanted cloned sequences resulting 
from either contaminants or PGR artifacts as follows. 
20 Sequences matching contaminant sequences such as 
vector DNA, tRNA, mtRNA, rRNA sequences are dis- 
carded as well as those encoding ORF sequences ex- 
hibiting extensive homology to repeats. Sequences ob- 
tained by direct cloning using nested primers on 5' and 
2S 3' tags (section 1 . case a) but lacking poly A tail may be 
discarded. Only ORFs containing a signal peptide and 
ending either before the poIyA tail (case a) or before the 
end of the cloned 3'UTR (case b) may be selected. 
Then, ORFs containing unlikely mature proteins such 
30 as mature proteins which size is less than 20 amino ac- 
ids or less than 25% of the immature protein size may 
be eliminated. 

[0199] Then, for each remaining full length extended 
cDNA containing several ORFs, a preselection of ORFs 
35 nriay be performed using the following criteria. The long- 
est ORF with a signal peptide is preferred. If the ORF 
sizes are similar, the chosen ORF is the one which sig- 
nal peptide has the highest score according to Von He- 
ijne method 

40 [0200] Sequences of full length extended cDNA 
clones may then be compared pairwise with BLAST af- 
ter masking of the repeat sequences. Sequences con- 
taining at least 90% homology over 30 nucleotides may 
be clustered in the same class. Each cluster may then 
be subjected to a cluster analysis that detects sequenc- 
es resulting from internal priming or from alternative 
splicing, identical sequences or sequences with several 
frameshifts. This automatic analysis serves as a basis 
for manual selection of the sequences. 

so [0201] Manual selection can be carried out using au- 
tomatically generated reports for each sequenced full 
length extended cDNA clone. During this manual proce- 
dure, a selection is operated between clones belonging 
toth same class as follows. 

ss [0202] Selection of full length xtended cDNA clones 
encoding sequ nces of interest is p rform d using the 
following crit ria. Structural parameters (initial tag, poly- 
ad nylation site and signal) may be ch eked. Then, ho- 
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mologies with known nucleic acids and prot ins may be 
examined in order to det rmine whether the clone se- 
quence match a known nucleic acid/protein sequence 
and, in the latter case, its covering rate and the date at 
which the sequence became public. Sequences result- 
ing from chimera or double inserts or located on chro- 
mosome breaking points as assessed by homology to 
other sequences may be discarded during this proce- 
dure as well 

[0203] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions ot the extend- 
ed cDNA using conventional techniques such as sub- 
cloning, PGR, or in vitro oligonucleotide synthesis. For 
example, if the extended cDNA is derived from a gene 
encoding a secreted polypeptide, it may include the full 
coding sequences (i.e. the sequences encoding the sig- 
nal peptide and the mature protein remaining after the 
signal peptide is cleaved off), the sequences encoding 
Ihe malure polypeptide (i.e. the polypeptide generated 
after the signal peptide is cleaved off), or only the coding 
sequences for the signal peptides. 
[0204] Similarly, nucleic acids containing any other 
desired portion of the coding sequences for the encoded 
protein may be obtained. For example, the nucleic acid 
may contain at least 10, 12, 15, 18, 20, 23, 25, 28, 30, 
35. 40. 50. 75. 100, 200, 300, 500, or 1000 consecutive 
bases of an extended cDNA. 

[0205] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 
quence it encodes. Once the encoded amino acid se- 
quence has been determined, one can create and iden- 
tify any of the many conceivable cDNAs that will encode 
that protein by simply using the degeneracy of the ge- 
netic code. For example, allelic variants or other homol- 
ogous nucleic acids can be identified as described be- 
low. Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0206] In a preferred embodiment, the coding se- 
quence may be selected using the known codon or co- 
don pair preferences for the host organism in which the 
cDNA is to be expressed. 

[0207] In addition to PGR based methods for obtain- 
ing cDN As which include the authentic 5'end of the cor- 
responding mRNA as well as the full protein coding se- 
quence of the corresponding mRNA, traditional hybrid- 
ization based methods may also be employed. These 
rhethods may also be used to obtain the genomic DNAs 
which encode the mRNAs from which the 5' ESTs or 
contigated consensus 5* ESTs were derived, mRNAs 
corresponding to the extended cDNAs; or nucleic acids 
which are homologous to extended cDNAs, 5* ESTs, or 
contigated consensus 5' ESTs. Example 18 below pro- 
vid s xampi s of such methods. 
[0208] Each identified ORF may b scann d for the 
presenc of a signal peptide in the first 50 amino-acids 
or. where appropriat , within shorter regions down to 20 
amino acids or less in the ORF, using the matrix method 



of von H ijne {Nuc. Acids Res. 14: 4683-4690 (1986)) 
and the modification described in Example 12. 

d) HorDology to eittier nucleotide or protein sequences 

5 

[0209] Sequences of full-length extended cDNAs are 
then compared to known nucleotide sequences. 
Polypeptides encoded by full-length extended cDNAs 
are then compared to known polypeptide sequences. 

10 [0210] Sequences of full-length extended cDNAs are 
compared to known nucleic acid sequences such as the 
vertebrate and EST sequences of Genbank, EMBL da- 
tabases and Genseq (Derwent's database of patented 
nucleotide sequences). FulHength cDNA sequences 

IS are also compared to the sequences of a private data- 
base (Genset internal sequences) in order to find se- 
quences that have already been identified by applicants. 
Sequences of full-length extended cDN As with more 
than 90% homology over 30 nucleotides using either 

20 BLASTN or BLAST2N are identified as sequences that 
have already been descrBDed. Matching vertebrate s - 
quences are subsequently examined using FASTA; full- 
length extended cDNAs with more than 70% honrx)fc>gy 
over 30 nucleotides are identified as sequences that 

2S have already been described. 

[0211] ORFs encoded by full-length extended cDNAs 
as defined in section c) are subsequently compared to 
known amino acid sequences found in public databases 
such as Swissprot, PIR and Genptept (Derwent's data- 

30 base of patented protein sequences). These analyses 
were performed using BLASTP with the parameter W=8 
and allowing a maximum of 10 matches. Sequences of 
full-length extended cDNAs showing extensive homol- 
ogy to known protein sequences are recognized as al- 

35 ready identified proteins. 

[0212] In addition, the three-frame conceptual trans- 
lation products of the top strand of full-length extended 
cDNAs are compared to publicly known amino ackJ se- 
quences of Swissprot using BLASTX with the parameter 

40 E=0.001. Sequences of full-length extended cDNAs 
with more than 70% homology over 30 amino acid 
stretches are detected as already identified proteins. 

5. Selection of cloned full-length sequences obtained 
45 from the 5' ESTs of the present invention 

[0213] Gloned full-length extended cDNA sequences 
that have already been characterized by the aforemen- 
tioned computer analysis are then submitted to an au- 
so tomatic procedure in order to preselect full-length x- 
tended cDNAs containing sequences of interest. 

a) Automatic sequence preselection 

ss [0214] All complet cloned full-length xt nded cD- 
NAs clipp d for vector on both nds are consider d. 
First, a negative selection is operated in order to limi- 
nate unwanted cksned sequences resulting from ither 
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contaminants or PGR artilacts as follows. S quences 
matching contaminant sequ nces such as vector DNA; 
tRNA, mtRNA. rRNA sequences are discarded as well 
as those encoding ORF sequences exhibiting extensive 
homology to repeats as defined in sectbn 4 a). Se- 
quences obtained by direct cloning using nested prim- 
ers on 5* and 3' tags (section 1 . case a) but lacking polyA 
tail are discarded Only ORFs containing a signal pep- 
tide and ending either before the polyA tail (case a) or 
before the end o1 the cloned 3'UTR (case b) are kept. 
Then. ORFs containing unlikely mature proteins such 
as mature proteins which size is less than 20 amino ac- 
ids or less than 25% of the immature protein size are 
eliminated. 

[0215] Then, for each remaining full-length extended 
cDNA containing several ORFs, a preselection of ORFs 
Is performed using the following criteria. The longest 
ORF with a signal peptide is preferred. It the ORF sizes 
are similar, the chosen ORF is the one which signal pep- 
tide has the highest score according to Von Heijne meth- 
od 

[0216] Sequences of full-length extended cDNA 
clones are then compared pairwise with BLAST after 
masking of the repeat sequences. Sequences contain- 
ing at least 90% homology over 30 nucleotides are clus- 
tered in the same class. Each cluster is then subjected 
to a cluster analysis that detects sequences resulting 
from internal priming or from alternative splicing, identi- 
cal sequences or sequences with several frameshifts. 
This automatic analysis serves as a basis for manual 
selection of the sequences. 

b) Manual sequence selection 

[0217] Manual selection can be carried out using au- 
tomatically generated reports for each sequenced full- 
length extended cDNA clone. During this manual proce- 
dure, a selection is operated between clones belonging 
to the same class as follows. ORF sequences encoded 
by clones belonging to the same class are aligned and 
compared. If the homology between nucleotide se- 
quences of clones belonging to the same class is more 
than 90% over 30 nucleotide stretches or if the honrxjl- 
ogy between amino acid sequences of clones belonging 
to the same class Is more than 80% over 20 amino acid 
stretches, than the clones are considered as being iden- 
tical. The chosen ORF is either the one exhibiting 
matches with known amino acid sequences or the best 
one according to the criteria mentioned in the automatic 
sequence preselection section. If the nucleotide and 
amino acid homologies are less than 90% and 80% re- 
spectively, the clones are said to encode distinct pro- 
teins which can be both selected if they contain se- 
quences of inter st. 

[0218] S lection of full-length extend dcDNA clones 
encoding sequenc s of Interest is p rformed using the 
following criteria. Structural paramet rs (Initial tag, poly- 
adenylation sit and signal) are first ch eked. Then, ho- 



mologies with known nucleic acids and proteins are ex- 
amined in order to determine whether the clone se- 
quence match a known nucleotide/protein sequence 
and, in the latter case, its cov ring rate and the date at 

s which the sequence became public. If there Is no exten- 
sive match with sequences other than ESTs or genomic 
DNA, or if the clone sequence brings substantial new 
information, such as encoding a protein resulting from 
alternative splicing of an mRNA coding for an already 

10 known protein, the sequence Is kept. Examples of such 
cloned full-length extended cDNAs containing sequenc- 
es of interest are described in Example 18. Sequences 
resulting from chimera or double inserts or located on 
chromosome breaking points as assessed by homology 

IS to other sequences are discarded during this procedure. 

[0219] Extended cDNAs prepared as described 
above may be subsequently engineered to obtain nu- 
cleic acids which include desired portions of the extend- 

20 ed cDNA using conventional techniques such as sub- 
cloning, PGR, or in vitro oligonucleotide synthesis. For 
example, nucleic acids which Include only the full coding 
sequences (i.e. the sequences encoding the signal pep- 
tide and the mature protein remaining after the signal 

25 peptide is cleaved off) may be obtained using tech- 
niques known to those skilled In the art. Alternatively, 
conventional techniques may be applied to obtain nu- 
cleic acids which contain only the coding sequences for 
the mature protein remaining after the signal peptide is 

30 cleaved off or nucleic acids which contain only the cod- 
ing sequences for the signal peptides. 
[0220] Similarly, nucleic acids containing any other 
desired portion of the coding sequences for the encoded 
protein nr^y be obtained. For example, the nucleic acid 

35 may contain at least 10, 15, 18, 20, 25, 28, 30. 35, 40, 
50, 75, 100, 150, 200, 300, 400 or 500 consecutive bas- 
es of an extended cDNA. 

[0221] Once an extended cDNA has been obtained, 
it can be sequenced to determine the amino acid se- 

40 quence it encodes. Once the encoded amino acid se- 
quence has been deterrpined, one can create and iden- 
tify any of the many conceivable cDN As that will encode 
that protein by simply using the degeneracy of the ge- 
netic code. For example, allelic variants or other homol- 

45 ogous nucleic acids can be Identified as described be- 
low. Alternatively, nucleic acids encoding the desired 
amino acid sequence can be synthesized in vitro. 
[0222] In a preferred embodiment, the coding se- 
quence may be selected using the knovm codon or co- 

50 don pair preferences for the host organism in which th 
cDNA is to be expressed. 

[0223] In addition to PGR based methods for obtain- 
ing cDNAs which include the authentic 5'end of the cor- 
responding mRNA as w II as the complete protein cod- 
55 ing sequenc of the corr sponding mRNA. traditional 
hybridization bas d methods may also b employed. 
These methods may also b used to obtain th genomic 
DNAs which ncode the mRNAs from which the 5' ESTs 
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or consensus contigated 5* ESTS wer derived. mRN As 
corresponding to the extended cDNAs. or nucleic acids 
which are homologous to extended cDNAs. 5' ESTs. or 
consensus contigated 5' ESTs. Example 18 below pro- 
vides examples ot such methods. 

EXAMPLE 18 

Methods for Obtaining Extended cDNAs which Include 
the Entire Coding Region and the Authentic 5'End ot the 
Corresponding mRNA or Nucleic Acids Homologous to 
Extended cDNAs. 5' ESTs or Consensus Contigated 5' 
ESTs 

[0224] A full-length cDNA library can be made using 
the strategies described in Examples 1 -4 above by re- 
placing the random nonamer used in Example 2 with an 
oligo-dT primer. Alternatively, a cDN A library or genomic 
DNA library may be obtained from a commercial source 
or made using techniques familiar to those skilled in the 
art- 

[0225] Such cDNA or genomic DNA libraries may be 
used to isolate extended cDNAs obtained from 5' ESTs 
or consensus contigated 5' ESTs or nucleic acids ho- 
mologous to extended cDNAs, 5* ESTs, or consensus 
contigated 5' ESTs as follows. The cDNA library or ge- 
nomic DNA library is hybridized to a detectable probe. 
The detectable probe may comprise at least 10, 15, 18, 
20, 25. 28. 30. 35. 40, 50, 75, 100, 150, 200. 300, 400 
or 500 consecutive nucleotides o1 the 5* EST. consensus 
contigated 5* EST, or extended cDNA. 
[0226] Techniques for identifying cDN A clones in a 
cDNA library which hybridize to a given probe sequence 
are disclosed in Sambrook et al.. Molecular Cloning: A 
Laboratory Manual 2d Ed,, Cold Spring Harbor Labora- 
tory Press, 1 989. The same techniques may be used to 
isolate genomic DNAs. 

[0227] Briefly, cDNA or genomic DNA clones which 
hybridize to the detectable probe are identified and iso- 
lated for further manipulation as follows. The detectable 
probe described in the preceding paragraph is labeled 
with a detectable label such as a radioisotope or a fluo- 
rescent molecule. Techniques for labeling the probe are 
well known and include phosphorylation with polynucle- 
otide kinase, nick translation, in v/fro transcription, and 
non radioactive techniques. The cDNAs or genomic 
DNAs in the library are transferred to a nitrocellulose or 
nylon filter and denatured. After blocking of non specific 
sites, the filter is incubated with the labeled probe for an 
amount of time sufficient to allow binding of the probe 
to cDNAs or genomic DNAs containing a sequence ca- 
pable of hybridizing thereto. 

[0228] By varying the stringency of the hybridization 
conditions used to id nlify cDNAs or genomic DNAs 
which hybridize to th det ctabi probe, cDNAs or ge- 
nomic DNAs having different levels of homology to the 
probe can b identified and isolated as described b low. 
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1 . Identification of cDNA or Genqmic DNA Sequences 
Having a High Degree of Homology to the l-ab led 
Probe 

S [0229] To identify cDNAs or genomic DNAs having a 
high degree of honrK)logy to the probe sequence, the 
melting temperature of the probe may be calculated us- 
ing the folfowing formulas: 

[0230] For probes between 1 4 and 70 nucleotides in 
10 length the melting temperature (Tm) is calculated using 

the formula: Tm=81.5+16.6(log [Na+])+0.41 (fraction 

G+C)-{600/N) where N is the length of the probe. 

[0231] If the hybridization is carried out in a solution 

containing formamide, the melting temperature may be 
15 calculated using the equation Tm=81 .5+16-6(log [Na+]) 

-f0.41 (fraction G+C)-(0.63%tornnamide)-(600/N) where 

N is the length ol the probe. 

[0232] Prehybridization may be carried out in 6X SSC, 
5X Denhardt's reagent, 0.5% SDS, 100 ^ig denatured 
^0 fragmented salmon sperm DNA or 6X SSC, 5X Den- 
hardt's reagent, 0.5% SDS, 1 00 |ig denatured fragment- 
ed salmon sperm DNA, 50% formamide. The formulas 
for SSC and Denhardt's solutions are listed in Sambrook 
et ai, supra. 

25 [0233] Hybridization is conducted by adding the de- 
tectable probe to the prehybridization solutions listed 
above. Where the probe comprises double stranded 
DNA, it is denatured before addition to the hybrkJization 
solution. The filter is contacted with the hybridization so- 

30 lution for a sufficient period of time to allow the probe to 
hybridize to extended cDN As or genomic DNAs contain- 
ing sequences complementary thereto or horrtologous 
thereto. For probes over 200 nucleotides in length, the 
hybridization may be carried out at 15-25**C below the 

35 Tm. For shorter probes, such as oligonucleotide probes, 
the hybridization may be conducted at 15-25*C below 
the Tm. Preferably, for hybridizations in 6X SSC, the hy- 
bridization is conducted at approximately 68*C. Prefer- 
ably, for hybridizations in 50% formamkJe containing so- 

40 lutions, the hybrklization is conducted at approximately 
42**C. 

[0234] All of the foregoing hybridizations would be 

considered to be under 'stringent' conditions. 

[0235] Following hybridization, the filter is washed in 

45 2X SSC, 0. 1 % SDS at room temperature tor 1 5 minutes. 
The fitter is then washed with O.IX SSC, 0.5% SDS at 
room temperature tor 30 minutes to 1 hour. Thereafter, 
the solution is washed at the hybridization temperature 
in O.IX SSC, 0.5% SDS, A final wash is conducted in 

50 O.IX SSC at room temperature, 

[0236] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography or 
other conventional techniques. 

55 2. Obtaining cDNA or G nomic DNA Sequences Having 
Lower Degrees of Homok>gv to the Labeled Probe 

[0237] The above procedure may be modified to iden- 
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tify cDNAs or genomic DNAs having deer asing levels 
ot homology to the probe sequence. For example, to ob- 
tain cDNAs or genomic DNAs of decreasing homology 
to the detectable probe, less stringent conditions may 
be used. For example^ the hybridization temperature 
may be decreased in increments of S^C from 68° C to 
42**C in a hybridization buffer having a sodium concen- 
tration of approximately 1 M, Following hybridization, the 
filter may be washed with 2X SSC, 0.5% SDS at the tem- 
perature of hybridization. These conditions are consid- 
-ered to be "moderate" conditions above 50**C and "low" 
conditions below 50°C. 

[0238] Altematively, the hybridization may be carried 
out in buffers, such as 6X SSC, containing formamide 
at a temperature of 42'*C. In this case, the concentration 

'of formamide in the hybridization buffer may be reduced 
in 5% increments from 50% to 0% to identify clones hav- 
ing decreasing levels of homology to the probe. Follow- 

* ing hybridization, the filter may be washed with 6X SSC, 
0.5% SDS at 50*0. These conditions are considered to 
be "moderate" conditions above 25% formamide and 
"low" conditions below 25% formamide. 
[0239] cDNAs or genomic DNAs which have hybrid- 
ized to the probe are identified by autoradiography. 

3. Determination of the Degree of Homology between 
the Obtained cDNAs or Genomic DNAs and S'ESTs. 
Consensus Contigated 5'ESTs. or Extended cDNAs or 
Between the Polypeptides Encoded by the Obtained 
cDNAs or Genomic DNAs and the Polypeptides 
Encoded bv the S'ESTs. Consensus Contigated S'ESTs. 
or Extended cDNAs 



[0240] To determine the level of homology between 
the hybridized cDNA or genomic DNA and the S'EST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the nucleotide sequences 
of the hybridized nucleic acid and the 5'EST, consensus 
contigated 5'EST or extended cDNA from which the 
probe was derived are compared. The sequences of the 
5'EST, consensus contigated 5'EST or extended cDNA 
from which the probe was derived and the sequences 
of the cDNA or genomic DNA which hybridized to the 
detectable probe may be stored on a computer readable 
medium as described below and compared to one an- 
other using any of a variety of algorithms familiar to 
those skilled in the art. those described below. 
[0241] To determine the level of homology between 
the polypeptide encoded by the hybridizing cDNA or ge- 
nomic DNA and the polypeptide encoded by the 5'EST, 
consensus contigated 5'EST or extended cDNA from 
which the probe was derived, the polypeptide sequence 
encoded by the hybridized nucleic acid and the polypep- 
tid s qu nc ncoded by the 5'EST, consensus conti- 
gat d 5'EST or extended cDNA from which the prob 
was d rived are compared. Th sequences of the 
polypeptide encoded by the 5'EST, consensus contigat- 
ed 5'EST or xtended cDN A from which the prob was 



deriv d and the polypeptide sequence encoded by the 
cDNA or genomic DNA which hybridized to the det ct- 
able probe may be stored on a computer readable me- 
dium as described below and compared to one another 

5 using any of a variety of algorithms familiar to those 
skilled in the art, those described below. 
[0242] Protein and/or nucleic acid sequence homolo- 
gies may be evaluated using any of the variety of se- 
quence comparison algorithms and programs known in 

10 the art. Such algorithms and programs include, but are 
by no means limited to, TBLASTN, BLASTP, FASTA, 
TFASTA, and CLUSTALW (Pearson and Lipman, 1988, 
Proa NaiL Acad, ScL USA 85(8):2A44-2448\ Altschul et 
al., 1990, J. Mot. BioL 215(3):A03-4W', Thompson etaL, 

IS 1994, Nucleic Acids Res. 22^2/4673-4680; Higgins et 
at., 1996, Methods Enzy/noA 26^383-402; Attschui et 
ai, 1990, J. MoL Biol. 275(^3/403-410; Altschul ef a/., 
1993, Nature Genetics 3:266-272). 
[0243] In a particularly preferred embodiment, protein 

^0 and nucleic acid sequence homologies are evaluated 
using the Basic Local Alignment Search Tool CBLAST") 
which is well known in the art (see, e.g., Karlin and Alt- 
schul. 1990, Proc. Natl Acad. Sci. USA 8Z2267-2268; 
Altschul etal., 1990, J. Mot. Biol. 275:403-410; Altschul 

2S et at., 1 993, Nature Genetics 5:266-272; Altschul et al., 
1997, Nuc. Acids Res. 25:3389-3402). In particular, five 
specific BLAST programs are used to perform the fol- 
lowing task: 
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(1) BLASTP and BLASTS compare an amino acid 
query sequence against a protein sequence data- 
base; 

(2) BLASTN compares a nucleotide query se- 
quence against a nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual 
translation products of a query nucleotide sequence 
(both strands) against a protein sequence data- 
base; 

(4) TBLASTN compares a query protein sequence 
against a nucleotide sequence database translated 
in ail six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations 
of a nucleotide query sequence against the six- 
frame translations of a nucleotide sequence data- 
base. 



[0244] The BLAST programs identify homologous se- 
quences by identifying similar segments, which are re- 
ferred to herein as "high-scoring segment pairs," b - 

50 tween a query amino or nucleic acid sequence and a 
test sequence which is preferably obtained from a pro- 
tein or nucleic acid sequence database. High-scoring 
segment pairs are preferably identified {i.e., aligned) by 
means of a scoring matrix, many of which ar known in 

55 th art. Preferably, th scoring matrix used is th 
BLOSUM62 matrix (Gonn t ef a/., 1992, Science 25e. 
1443-1445; Henikoff and Henikoff, 1993, Proteins IT. 
49-61). Less preferably, the PAM or PAM250 matrices 



25 



49 



EP 1 033 401 A2 



50 



may also be used (s e, e.p., Schwartz and Dayhofl. 
eds.. 1978, Matrices for Detecting Distance Relation- 
ships: Atlas of Protein Sequence and Structure, Wash- 
ington: National Biomedical Research Foundation) 
[0245] The BLAST programs evaluate the statistical 
significance of all high-scoring segment pairs identified, 
and preferably selects those segments which satisfy a 
user-specified threshold of significance, such as a user- 
specified percent hortKilogy. Preferably, the statistical 
significance of a high-scoring segment pair is evaluated 
using the statistical significance formula of Karlin (see, 
e.g.. Karlin and Altschul, 1990. Proc. Natl Acad, Sd. 
USA 8^2267-2268). 

[0246] The parameters used with the above algo- 
rithms may be adapted depending on the sequence 
length and degree of homology studied. In some em- 
bodiments, the parameters may be the default parame- 
ters used by the algorithms in the absence of instruc- 
tions from the user. 

[0247] In some embodiments, the level of homology 
between the hybridized nucleic acid and the extended 
cDNA, 5'EST, or 5' consensus contigated EST from 
which the probe was derived may be determined using 
the FASTDB algorithm described in Brutlag et al. Comp. 
App. Biosci. 6:237-245, 1990. In such analyses the pa- 
rameters may be selected as follows: Matrix=Unitary, k- 
tuple=4. Mismatch Penalty=1 , Joining Penatty=30. Ran- 
domization Group Length^O, Cutoff Score=1 , Gap Pen- 
alty's, Gap Size Penalty=0.05, Window Size=500 or the 
length of the sequence which hybridizes to the probe, 
whichever is shorter Because the FASTDB program 
does not consider 5' or 3' truncations when calculating 
homology levels, if the sequence which hybridizes to the 
probe is truncated relative to the sequence of the ex- 
tended cDNA, 5'EST. or consensus contigated 5'EST 
from which the probe was derived the homology level is 
manually adjusted by calculating the number of nucle- 
otides of the extended cDNA. 5'EST, or consensus con- 
tigated 5' EST which are not matched or aligned with 
the hybridizing sequence, determining the percentage 
of total nucleotides of the hybridizing sequence which 
the non-matched or non-aligned nucleotides represent, 
and subtracting this percentage from the homology lev- 
el. For example, if the hybridizing sequence is 700 nu- 
cleotides in length and the extended cDNA, 5'EST, or 
consensus contigated 5' EST sequence is 1000 nucle- 
otides in length wherein the first 300 bases at the 5* end 
of the extended cDNA, 5'EST, or consensus contigated 
5' EST are absent from the hybridizing sequence, and 
wherein the overlapping 700 nucleotides are identical, 
the homology level would be adjusted as follows. The 
non-matched, non-aligned 300 bases represent 30% of 
the length of the extended cDNA, 5*EST. or consensus 
contigated 5' EST If th overiapping 700 nucleotides are 
100% id ntical, the adjusted homology level would be 
1 00-30=70% honrtology. It should be not d that the pre- 
ceding adjustments ar only made wh n the non- 
matched or non-aligned nucleotides are at the 5* or 3' 



ends. No adjustm nts are mad if th non-matched or 
non-aligned sequences are internal or under any other 
conditions. 

[0248] For example, using the above methods, nucle- 
5 ic acids having at least 95% nucleic acid homology, at 
least 96% nucleic acid homology, at least 97% nucleic 
acid homology, at least 98% nucleic acid homology, at 
least 99% nucleic acid homology, or move than 99% nu- 
cleic acid homology to the extended cDNA. 5'EST, or 
10 consensus contigated 5' EST from which the probe was 
derived may be obtained and identified. Such nucleic 
acids may be allelic variants or related nucleic acids 
from other species. Similarly, by using progressively 
less stringent hybridization corulitions one can obtain 
IS and identity nucleic acids having at least 90%, at least 
85%. at least 80% or at least 75% homology to the ex- 
tended cDNA, 5'EST, or consensus contigated 5' EST 
from which the probe was derived. 
[0249] Using the above methods and algorithms such 
20 as FASTA with parameters depending on the sequenc 
length and degree of homology studied, for example the 
default parameters used by the algorithms in the ab- 
sence of instructions from the user, one can obtain nu- 
cleic acids encoding proteir^s having at least 99%, at 
2S least 98%. at least 97%, at least 96%. at least 95%, at 
least 90%, at least 85%. at least 80% or at least 75% 
horrx>logy to the protein encoded by the extended cD- 
NA. 5'EST, or consensus contigated 5* EST from which 
the probe was derived. In some embodiments, the ho- 
30 mology levels can be determined using the "default" 
opening penalty and the "default" gap penalty, and a 
scoring nnatrix such as RAM 250 (a standard scoring ma- 
- - trix; see Dayhoff et al., in: Atlas of Protein Sequence and 

Structure. Vol. 5, Supp. 3 (1 978)). 
35 [0250] Alternatively, the level of polypeptide homolo- 
gy may be determined using the FASTDB algorithm de- 
scribed by Brutlag et al. Comp. App. Bibsci. 6:237-245. 
1 990. In such analyses the parameters may be selected 
as follows: Matrb<=PAM 0, k-tuple=2. Mismatch Penal- 
40 ty=1. Joining Penalty=20. Randomization Group 
Length=0. Cutoff Score=1. Window Size=Sequence 
Length, Gap Penalty=5, Gap Size Penalty=0.05. Win- 
dow Size=500 or the length of the homologous se- 
quence, whichever is shorter It the homologous amino 
4S acid sequence is shorter than the amino acid sequence 
encoded by the extended cDNA, 5'EST, or consensus 
contigated 5' EST as a result of an N terminal and/or C 
terminal deletion the results may be manually corrected 
as follows. First, the nunriber of amino acid residues of 
so the amino acid sequence encoded by the extended cD- 
NA, 5'EST, or consensus contigated 5' EST which ar 
not matched or aligned with the homologous sequence 
is determined. Then, the percentage of the length of the 
sequ nee encod d by th extend d cDNA, 5'EST, or 
55 cons nsus contigated 5' EST which th non-match dor 
non-aligned amino acids repr sent is calculat d. This 
percentage is subtracted from the homology I veL For 
xample wher in the amino acid sequenc encoded by 
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the extended cDNA, 5'EST or consensus contigated 5' 
EST is 100 amino acids in length and th length of the 
homologous sequence is 80 amino acids and wherein 
the amino acid sequence encoded by the extended cD- 
NA or 5'EST is truncated at the N terminal end with re- 
spect to the homologous sequence, the homology level 
is calculated as follows. In the preceding scenario there 
' 'iare 20 non-matched, non-aligned amino acids in the se- 
quence encoded by the extended cDNA. S'EST, or con- 
sensus contigated 5' EST This represents 20% of the 
length of the amino acid sequence encoded by the ex- 
tended cDNA, 5'EST, or consensus contigated 5* EST 
If the remaining amino acids are 1005 identical between 
the two sequences, the homobgy level would be 1 00%- 
20%=80% homology. No adjustments are made if the 
-non -matched or non-aligned sequences are internal or 
under any other conditions. 

[0251] In additbn to the above described methods, 
other protocols are available to obtain extended cDN As 
using 5' ESTs or consensus contigated 5'ESTs as out- 
lined in the following paragraphs. 
[0252] Extended cDNAs may be prepared by obtain- 
ing mRNA from the tissue, cell, or organism of Interest 
using mRNA preparation procedures utilizing polyA se- 
lection procedures or other techniques known to those 
skilled in the art. A first primer capable of hybridizing to 
the polyA tail of the mRNA is hybridized to the mRNA 
and a reverse transcription reaction is performed to gen- 
erate a first cDNA strand. 

[0253] The first cDN A strand is hybridized to a second 
primer containing at least 10 consecutive nucleotides of 
^ the sequences of SEQ ID NOs 24-4100 and 
* 8178-36681 . Preferably, the primer comprises at least 
10, 12. 15, 17, 18. 20, 23, 25, or 28 consecutive nucle- 
otides from the sequences of SEQ ID NOs 24-4100 and 
8178-36681 . In some embodiments, the primer com- 
prises more than 30 nucleotides from the sequences of 
SEQ ID NOs 24-4100 and 8178-36681. If it Is desired 
to obtain extended cDNAs containing the full protein 
coding sequence, including the authentic translation in- 
itiation site, the second primer used contains sequences 
■ located upstream of the translation initiation site. The 
second primer is extended to generate a second cDNA 
strand complementary to the first cDN A strand. Alterna- 
tively. RT-PCR may be performed as described above 
using primers from both ends of the cDNA to be ob- 
tained. 

[0254] Extended cDNAs containing 5' fragments of 
the mRNA may be prepared by hybridizing an mRNA 
comprising the sequences of SEQ ID NOs: 24-41 00 and 
8178-36681 with a primer comprising a complementary 
to a fragment of an EST-related nucleic acid hybridizing 
the primer to the mRNAs, and reverse transcribing the 
hybridized primer to make a first cDNA strand from the 
mRNAs. Preferably, the primer compris s at least 10, 
12. 15. 17. 18. 20, 23, 25, or 28 cons cutiv nucleotides 
of the sequences complementary to SEQ ID NOs: 
24-41 00 and 8178-36681. 



[0255] Thereafter, a second cDNA strand comple- 
mentary to the first cDNA strand is synthesized. The 
second cDNA strand may be made by hybridizing a 
primer complementary to sequences in the first cDNA 

5 strand to the first cDNA strand and extending the primer 
to generate the second cDNA strand. 
[0256] The double stranded extended cDNAs made 
using the methods described above are isolated and 
cloned. The extended cDNAs may be cloned into vec- 

10 tors such as plasmids or viral vectors capable of repli- 
cating in an appropriate host cell. For example, the host 
cell may be a bacterial, mammalian, avian, or insect cell. 
[0257] Techniques for isolating mRNA, reverse tran- 
scribing a primer hybridized to mRNA to generate a first 

15 cDNA strand, extending a primer to make a second cD- 
NA strand complementary to the first cDNA strand, iso- 
lating the double stranded cDN A and cloning the double 
stranded cDNA are well known to those skilled in the art 
and are described in Current Protocols in Molecular Bi- 

20 ology, John Wiley & Sons. Inc. 1 997 and Sambrook et 
ai. Molecular Cloning: A Laboratory Manual, Second 
Edition. Cold Spring Harbor Laboratory Press. 1989. 
[0258] Alternatively, other procedures may be used 
for obtaining full-length cDNAs or extended cDNAs. in 

2S one approach, full-length or extended cDNAs are pre- 
pared from mRNA and cloned into double stranded 
phagemids as follows. The cDNA library in the double 
stranded phagemids is then rendered single stranded 
by treatment with an endonuclease, such as the Gene 

30 II product of the phage Fl and an exonuclease (Chang 
etai. Gene 127:96-8. 1 993). A biotinylated oligonucle- 
otide comprising the sequence of a fragment of an EST- 
related nucleic acid is hybridized to the single stranded 
phagemids. Preferably, the fragment comprises at least 

35 10. 12, 15, 17. 18, 20, 23, 25. or 28 consecutive nucle- 
otides of the sequences of SEQ ID NOs: 24-4100 and 
8178-36681. 

[0259] Hybrids between the biotinylated oligonucle- 
otide and phagemids are isolated by incubating the hy- 

40 brids with streptavidin coated paramagnetic beads and 
retrieving the beads with a magnet (Fry et ai, Biotech- 
niques, 13: 124-131. 1992). Thereafter, the resulting 
phagemids are released from the beads and converted 
into double stranded DN A using a primer specific for the 

45 5' EST or consensus contigated 5'EST sequence used 
to design the biotinylated oligonucleotide. Alternatively, 
protocols such as the Gene Trapper kit (Gibco BRL) may 
be used. The resulting double stranded DNA is trans- 
formed into bacteria. Extended cDNAs or full length cD- 

50 NAs containing the 5* EST or consensus contigat d 
5'EST sequence are identified by colony PCR or colony 
hybridization. 

[0260] Using any of the above described methods in 
section III. a plurality of extended cDNAs containing full- 
55 length protein coding sequences or portions of the pro- 
tein coding sequences may b provided as cDNA librar- 
ies for subsequent evaluation of the encoded proteins 
or use in diagnostic assays as described below. 
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EXAMPLE 19 

Full Length cDNAs 

[0261] The procedures described in Example 17 and 
18 were used to obtain 376 extended cDNAs or full 
length cDNAs derived from 5' ESTs in a variety of tis- 
sues. The following list provides a few examples of thus 
obtained cDNAs. 

[0262] Using this procedure, the full length cDNA of 
SEQ ID NO:1 (internal identificatbn number 
58-34-2-E7-FL2) was obtained. This cDNA encodes the 
signal peptide MWWFQQGLSFLPSALVIWTSA (SEQ 
ID NO:2) having a von Heijne score of 5.5. 
[0263] Using this approach, the full length cDNA of 
SEQ ID NQ:3 (internal identification number 
48- 1 9-3-G 1 -FL1 ) was obtained. This cDN A encodes the 
signal peptide MKKVLLLITAILAVAVG (SEQ ID NO: 4) 
having a von Heijne score of 8.2. 
[0264] The full length cDNA of SEQ ID NO:5 (intemal 
identification number 58-35-2-F10-FL2) was also ob- 
tained using this procedure. This cDNA encodes a sig- 
nal peptide LWLLFFLVTAIHA (SEQ ID NO:6) having a 
von Heijne score of 10.7. 

[0265] Furthermore, the polypeptides encoded by the 
extended or full-length cDNAs may be screened for the 
presence of known structural or functional motifs or for 
the presence of signatures, small amino acid sequences 
which are well conserved amongst the members of a 
protein family. The results obtained for the polypeptides 
encoded by afewf uiHength cDNAs derived from 5'ESTs 
that were screened for the presence of known protein 
signatures and motifs using the Proscan software from 
the GCG package and the Prosite 15.0 database are 
provided below. 

[0266] The protein of SEQ ID NO: 8 encoded by the 
full-length cDNA SEQ ID NO: 7 (internal designation 
78-8-3-E6-CL0_1 C) and expressed in adult prostate be- 
long to the phosphatidylethanolamine-binding protein 
from which it exhibits the characteristic PROSITE sig- 
nature from positions 90 to 1 1 2. Proteins from this wide- 
spread family, from nematodes to fly. yeast, rodent and 
primate species, bind hydrophobic ligands such as 
phospholipids and nucleotides. They are mostly ex- 
pressed in brain and in testis and are thought to play a 
role in cell growth and/or maturation, in regulation of the 
sperm maturation, motility and tn membrane remode- 
ling. They may act either through signal transduction or 
through oxidoreduction reactions (for a review see Sch- 
oentgen and Jolles, FEBS Letters, 369 :22-26 (1995)). 
Taken together, these data suggest that the protein of 
SEQ ID NO: 8 may play a role in cell growth, maturation 
and in membrane remodeling and/or may be related to 
male f rtility. Thus, these protein may b us ful in diag- 
nosing and/or treating cancer, n urodegen rative dis- 
as s, and/or disord rs relat d to male f rtility and ste- 
rility. 

[0267] The protein of SEQ ID NO :10 encoded by the 



full-l ngth cDNA SEQ ID NO:9 (internal designation 
1 08-01 3-5-0-H9-FLC) shows homologies with a family 
of lysophospholipases conserved among eukaryotes 
(yeast, rabbit rodents and human). In addition, some 
5 members of this family exhibit a calcium-independent 
phospholipase A2 activity (Portilla et al, J. Am, Soc. Ne- 
phro., 9 :1178-1186 (1998)). All members of this family 
exhibit the active site consensus GXSXG motif of car- 
boxylesterases that is also found in the protein of SEQ 
10 ID NO :10 (position 54 to 58). In addition, this protein 
may be a membrane protein with one transmembrane 
domain as predicted by the software TopPred 11 (Claros 
and von Heijne, CABIOS applic. Notes, 10:685-686 
(1 994)). Taken together, these data suggest that the pro- 
's tein of SEQ ID NO: 10 may play a role in fatty acid me- 
tabolism, probabfy as a phospholipase. Thus, this pro- 
tein or part therein, may be useful in diagnosing and/or 
treating several disorders including, but not limited to, 
cancer, diabetes, and neurodegenerative disorders 
20 such as Parkinson's and Alzheimer's diseases. It may 
also be useful in modulating inflammatory responses to 
infect k>us agents and/or to suppress graft reject ran. 
[0268] The protein of SEQ ID NO: 1 2 encoded by the 
full-length cDNA SEQ ID NO: 11 (intemal designation 
2S 1 08-004-5-0- D10-FLC) shows remote homology to a 
subfamily of beta4-galactosyltransf erases widely con- 
served in animals (human, rodents, cow and chick n). 
Such enzymes, usually type II membrane proteins lo- 
cated in the endoplasmic reticulum or in the Gotgi ap- 
30 paratus, catalyzes the biosynthesis of glycoproteir^, 
glycol ip id glycans and lactose. Their characteristic fea- 
tures defined as those of subfamily A in Breton et al, J. 
Biochem., 123:1000-1009 (1998) are pretty well con- 
served in the protein of SEQ ID NO: 12, especially the 
35 region I containing the DVD motif (positions 163-165) 
thought to be involved either in UDP binding or in the 
catalytic process itself. In additbn, the protein of SEQ 
ID NO: 12 has the typcal structure of a type II protein. 
Indeed, it contains a short 28-amino-acid-long N-termi- 
40 nal tail, a transmembrane segment from positions 29 to 
49 and a large 278-amino-acid-long C-terminal tail as 
predicted by the software TopPred II (Claros and von 
Heijne, CAB/OS applic. Notes, 10:685-686 (1994)). 
Taken together, these data suggest that the protein of 
45 SEQ ID NO: 12 may play a role in the bk>synthesis of 
polysaccharides, and of the carbohydrate moieties of 
glycoproteins and glycolipids and/or in cell-cell recogni- 
tion. Thus, this protein may be useful in diagnosing and/ 
or treating several types of disorders including, but not 
so limited to, cancer, atherosclerosis, cardiovascular disor- 
ders, autoimmune disorders and rheumatic diseases in- 
cluding rheumatoid arthritis. 

[0269] The protein of SEQ ID NO: 1 4 encoded by the 
full-l ngth cDNA SEQ ID NO: 13 (internal d signation 
55 1 08-009-5-0- A2-FLC) shows xtensive homology to the 
bZIP family of transcription factors, andesp ciallytothe 
human luman protein (Lu et a!., MoL Ceil. Biol., 17: 
5117-5126 (1997))). The match includ the whole bZIP 
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domain composed of a basic DNA-binding domain and 
of a leucine zipper allowing protein dimerization. The ba- 
sic domain Is conserved in the protein of SEQ ID NO: 
1 4 as shown by the characteristic PROSITE signature 
(positions 224-237) except for a conservative substitu- 
tion of a glutamic acid with an aspartic acid in position 
233. The typical PROSITE signature for leucine zipper 
Ts also present (positions 259 to 280). Taken together, 
these data suggest that the protein of SEQ ID NO: 14 
may bind to DNA, hence regulating gene expression as 
% transcription factor. Thus, this protein may be useful 
in diagnosing and/or treating several types of disorders 
including, but not limited to, cancer. 
[0270] Bacterial clones containing plasm ids contain- 
ing the full length cDNAs described above are presently 
stored in the inventor's laboratories under the internal 
identification numbers provided above. The inserts may 
be recovered from the deposited materials by growing 
an aliquot of the appropriate bacterial clone in the ap- 
propriate medium. The plasmid DNA can then be isolat- 
ed using plasmid isolation procedures familiar to those 
skilled in the art such as alkaline lysis minipreps or large 
scale alkaline lysis plasmid isolation procedures. If de- 
sired the plasmid DNA may be further enriched by cen- 
trif ugation on a cesium chloride gradient, size exclusion 
chromatography, or anion exchange chromatography. 
The plasmid DNA obtained using these procedures may 
then be manipulated using standard cloning techniques 
familiar to those skilled in the art. Alternatively, a PGR 
can be done with primers designed at both ends of the 
EST insertion. The PGR product which corresponds to 
the 5'EST can then be manipulated using standard clon- 
ing techniques familiar to those skilled in the art. 

IV. Expression of Proteins 

[0271] EST-related nucleic acids, fragments of EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids, and fragments of positional segments 
bf EST-related nucleic acids may be used to express the 
polypeptides which they encode. In particular, they may 
be used to express EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segrnents 
of EST-related polypeptides, or fragments of positional 
segments of EST-related polypeptides. In some embod- 
iments, the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids, and fragments of 
positional segments of EST-related nucleic acids may 
be used to express the full polypeptide (i.e. the signal 
peptide and the mature polypeptide) of a secreted pro- 
tein, the mature protein (i.e. the polypeptide generated 
after cleavage of the signal peptide), or the signal pep- 
tide of a secreted protein. If desired, nucleic acids en- 
coding th signal peptid rrtay be us d to facilitat se- 
cretion of th xpressed protein. It will be appreciated 
that a plurality of EST-related nucleic acids, fragments 
of EST-relat d nucleic acids, positional segments of 
EST-related nucleic acids, or fragments of positional 



segments of EST-related nucleic acids may be simulta- 
neously cloned into expression vectors to create an x- 
pression library for analysis of the encoded proteins as 
described below 

s 

EXAMPLE 20 

Expression of the Proteins Encoded bv the Genes 
Corresponding to the 5'ESTs or Consensus Contigated 
10 5' ESTs 

[0272] To express their encoded proteins the EST-re- 
lated nucleic acids, fragments of EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids, 

IS or fragments of positional segments of EST-related nu- 
cleic acids are ctoned into a suitable expression vector. 
In some instances, nucleic acids encoding EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 

20 fragments of positional segments of EST-related 
polypeptides may be cloned into a suitable expression 
vector. 

[0273] In sorhe embodiments, the nucleic acids in- 
serted into the expression vector may comprise the cod- 

^5 ing sequence of a sequence selected from the group 
consisting of 24-4100. tn other embodiments, the nucle- 
ic acids inserted into the expression vector may com- 
prise may comprise the full coding sequence (i.e. the 
nucleotides encoding the signal peptide and the mature 

30 polypeptide)ofoneofSEQIDNOs: 3721-3811. Insome 

embodiments, the nucleic acid inserted into the expres- 
sion vector may comprise the nucleotides of one of the 
sequences of SEQ ID NOs; 3721-3811 which encode 
the mature polypeptide (i.e. the nucleotides encoding 

35 the polypeptide generated after cleavage of the signal 
peptide). In further embodiments, the nucleic acids in- 
serted into the expression vector may comprise the nu- 
cleotides of 24-652 and 3721-3811 which encode the 
signal peptide to facilitate secretion of the expressed 

40 protein. The nuciek; acids inserted into the expression 
vectors may also contain sequences upstream of the se- 
quences encoding the signal peptide, such as sequenc- 
es which regulate expression levels or sequences which 
confer tissue specific expressk>n. 

45 [0274] The nucleic acid inserted into the expression 
vector may encode a polypeptide comprising the one of 
the sequences of SEQ I D NOs: 41 01 -81 77. In some em- 
bodiments, the nucleic acid inserted into the expression 
vector may encode the full polypeptide sequence (i.e. 

so the signal peptide and the mature polypeptide) included 
in one of SEQ ID NOs: 7798-7888. In other embodi- 
ments, the nucleic acid inserted into the expression vec- 
tor may encode the mature polypeptide (i.e. the 
polyp ptid generat d after cleavage of the signal pep- 

ss tid ) includ d in on of the sequ nces of SEQ ID NOs: 
798-7888, In further mbodiments. the nuciek: acids in- 
serted into the xpression vector may encode the signal 
peptide includ d in on ofthesequenc s of 41 01 -4729 
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and 7798-7888. 

[0275] The nucleic acid encoding the protein or 
polypeptide to be expressed is operably linked to a pro- 
moter in an expressbn vector using conventional clon- 
ing technology. The expression vector may be any of 
the mammalian, yeast, insect or bacterial expression 
systems known in the art. Commercially available vec- 
tors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, 
MA), Stratagene (La Jolla, California), Promega (Madi- 
son, Wisconsin), and Invitrogen (San Diego, Califomia). 
If desired, to enhance expression and facilitate proper 
protein folding, the codon context and codon pairing of 
the sequence may be optimized for the particular ex- 
pression organism in which the expression vector is in- 
troduced, as explained by Hatfield, et ai, U.S. Patent 
No. 5,082.767. 

[0276] The following is provided as one exemplary 
method to express the proteins encoded by the nucleic 
acids described above. In some instances the nucleic 
acid encoding the protein or polypeptide to be ex- 
pressed includes a methionine initiation codon and a 
polyA signal. If the nucleic acid encoding the polypep- 
tide to be expressed lacks a methionine to serve as the 
initiation site, an initiating methionine can be introduced 
next to the first codon of the nucleic acid using conven- 
tional techniques. Similarly, if the nucleic acid encoding 
the protein or polypeptide to be expressed lacks a poly A 
signal, this sequence can be added to the construct by, 
for example, splicing out the polyA signal from pSGS 
(Stratagene) using Bgll and Sail restriction endonucle- 
ase enzymes and incorp>orating it into the mammalian 
expression vector pXTI (Stratagene). pXTI contains 
the LTRs and a portion of the gag gene from NAoloney 
Murine Leukemia Virus. The position of the LTRs in the 
construct allow efficient stable transf ection. The vector 
includes the Herpes Simplex thymidine kinase promoter 
and the selectable neomycin gene. The nucleic acid en- 
coding the polypeptide to be expressed is obtained by 
PCR from the bacterial vector using oligonucleotide 
primers complementary to the nucleic acid encoding the 
protein or polypeptide to be expressed and containing 
restriction endonuclease sequences for Pst I incorpo- 
rated into the 5'primer and Bglll at the 5' end of 3' primer, 
taking care to ensure that the nucleic acid encoding the 
protein or polypeptide to be expressed is correctly po- 
sitioned with respect to the poly A signal The purified 
fragment obtained from the resulting PCR reaction is di- 
gested with PstI, blunt ended with an exonuclease, di- 
gested with Bgl II. purified and ligated to pXT1 , now con- 
taining a poly A signal and digested with Bglll. 
[0277] The ligated product is transfected into mouse 
NIH 3T3 cells using Lipofectin (Life Technologies, Inc., 
Grand Island, New York) und r conditions outlined in the 
product specification. Positive transfectants ar select- 
ed aft r growing the transfected cells in 600 \xg/m\ G41 8 
(Sigma, St. Louis. Missouri). 

[0278] Altemativ ly, the nucleic acid encoding the 



protein or polyp ptide to b xpr ssed roay b cloned 
into pED6dpc2 as described above. The resulting 
pED6dpc2 constmcts may be transfected into a suitable 
host cell, such as COS 1 cells. Methotrexate resistant 
5 cells are selected and expanded. The expressed protein 
or polypeptide may be isolated, purified, or enriched as 
described above. 

[0279] To confirm expression of the desired protein or 
polypeptide, the proteins or polypeptides produced by 

10 cells containing a vector with a nucleic acid insert en- 
coding the protein or polypeptide are compared to those 
lacking such an insert. The expressed proteins are de- 
tected using techniques familiar to those skilled in the 
art such as Coomassie blue or sih^er staining or using 

IS antibodies against the protein or polypeptide encoded 
by the nucleic acid insert. Antibodies capable of specif- 
ically recognizing the protein of interest may be gener- 
ated using synthetic 15-mer peptides having a se- 
quence encoded by the appropriate nucleic acid. The 

20 synthetic peptides are injected into mice to generate an- 
tibody to the polypeptide encoded by the nucleic acid. 
[0280] If the proteins or polypeptides encoded by the 
nucleic ackJ inserts are secreted, medium prepared 
from the host cells or organisms containing an expr s- 

2S sion vector which contains a nucleic acid insert encod- 
ing the desired proteirt or polypeptide is compared to 
mdieum prepared from the control cells or organism. 
The presence of a band in medium from the cells con- 
taining the nucleic acid insert which is absent from pr p- 

30 arations from the control cells indicates that the protein 
or polypeptide encoded by the nucleic acid insert is be- 
ing expressed and secreted. Generally, the band corre- 
sponding to the protein encoded by the nuciek: ackJ in- 
sert will have a mobility near that expected based on the 

35 number of amino acids in the open reading frame of the 
nucleic acid insert. However, the band may have a mo- 
bility different than that expected as a result of modifi- 
cations such as glycosylatiori, ubiquitination. or enzy- 
matic cleavage. 

40 [0281] Alternatively, if the protein expressed from the 
above expression vectors does not contain sequences 
directing its secretion, the proteins expressed from host 
cells containing an expression vector with an insert en- 
coding a secreted protein or portion thereof can be com- 

45 pared to the proteins expressed in control host cells con- 
taining the expression vector without an insert. The 
presence of a band in samples from cells containing the 
expression vector with an insert which is absent in sam- 
ples from cells containing the expression vector without 

so an insert indicates that the desired protein or portk>n 
thereof is being expressed. Generally, the band will 
have the mobility expected for the secreted protein or 
portion thereof. However, the band nr^y have a mobility 
diff r nt than that xpected as a result of modifications 

55 such as glycosylation. ubiquitination. or enzymatic 
cleavage. 

[0282] Th xpressed prot in or polypeptide may be 
purifi d. isolat dor nriched using a variety of methods. 
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In some methods, the protein or polypeptide may be se- 
creted into the culture medium via a native signal pep- 
tide or a heterologous signal peptid operably linked 
thereto. In some methods, the protein or polypeptide 
may be linked to a heterologous polypeptide which fa- 
cilitates its isolation, purification, or enrichment such as 
a nickel binding polypeptide. The protein or polypeptide 
may also be obtained by gel electrophoresis, ion ex- 
change chromatography, size chromatography, hplc. 
salt precipitation, immunoprecipitation, a combination of 
any of the preceding methods, or any of the isolation, 
purification, or enrichment techniques familiar to those 
skilled in the art. 

[0283] The protein encoded by the nucleic acid insert 
may also be purified using standard immunochromatog- 
raphy techniques using immunoatfinity chromatography 
with antibodies directed against the encoded protein or 
polypeptide as described in more detail below. If anti- 
body production is not possible, the nucleic acid insert 
encoding the desired protein or polypeptide may be in- 
corporated into expression vectors designed for use in 
purification schemes employing chimeric polypeptides. 
In such strategies, the coding sequence of the nucleic 
acid insert is ligated in frame with the gene encoding the 
other half of the chimera. The other half of the chimera 
may be |3-globin or a nickel binding polypeptide. A chro- 
matography matrix having antibody to p-globin or nickel 
attached thereto is then used to purify the chimeric pro- 
tein Protease cleavage sites may be engineered be- 
tween the p-gtobin gene or the nickel binding polypep- 
tide and the extended cDNA or portion thereof. Thus, 
the two polypeptides of the chimera may be separated 
from one another by protease digestion. 
[0284] One useful expression vector for generating p- 
globin chimerics is pSG5 (Stratagene). which encodes 
rabbit p-globin. Intron II of the rabbit p-globin gerie facil- 
itates splicing of the expressed transcript, and the poly- 
adenylatlon signal Incorporated into the construct in- 
creases the level of expressbn. These techniques as 
described are well known to those skilled in the art of 
molecular biology. Standard methods are published in 
methods texts such as Davis ef a/., (Basic Methods in 
Molecular Biology, LG. Davis, M.D. Dibner, and J.F. 
Battey, ed., Elsevier Press, NY, 1986) and many of the 
methods are available from Stratagene, Life Technolo- 
gies, Inc., or Promega. Polypeptide may additionally be 
produced from the construct using in vitro translation 
systems such as the In vitro Express™ Translation Kit 
(Stratagene). 

[0285] Following expression and purification of the 
proteins or polypeptides encoded by the nucleic acid in- 
serts, the purified proteins may be tested for the ability 
to bind to the surface of various cell types as described 
in ExampI 21 b low. It will be appreciated that a plural- 
ity of prot Ins xpr ssed from these nucleic acid ins rts 
may be included in a pan I of proteins to b simultane- 
ously evaluated for the activities specifically d scribed 
below, as well as oth r biological roles for which assays 



for det rmining activity are available. 
EXAMPLE 21 

5 Analysis of Secreted Proteins to Determine Whether 
thev Bind to the Cell Surface 

[0286] The EST-related nucleic acids, fragments of 
EST-related nucleic acids, positional segments of EST- 

10 related nucleic acids, fragments of positional segments 
of EST-related nucleic acids, nucleic acids encoding the 
EST-related polypeptides, nucleic acids encoding frag- 
ments of the EST-related polypeptides, nucleic acids 
encoding positional segments of EST-related polypep- 

is tides, or nucleic acids encoding fragments of positional 
segments of EST-related polypeptides are cloned into 
expression vectors such as those described in Example 
20. The encoded proteins or polypeptides are purified, 
isolated, or enriched as described above. Following pu- 
rlfication, isolation, or enrbhment, the proteins or 
polypeptides are labeled using techniques known to 
those skilled in the art. The labeled proteins or polypep- 
tides are incubated with cells or cell lines derived from 
a variety of organs or tissues to allow the proteins to 

2S bind to any receptor present on the cell surface. Follow- 
ing the incubation, the cells are washed to remove non- 
specificalty bound proteins or polypeptides. The specif- 
ically bound labeled proteins or polypeptides are detect- 
ed by autoradiography. Alternatively, unlabeled proteins 

30 or polypeptides may be incubated with the cells and de- 
tected with antibodies having a detectable label, such 
as a fluorescent molecule, attached thereto. 
[0287] Specificity of cell surface binding may be ana- 
lyzed by conducting a competition analysis In which var- 

35 ious amounts of unlabeled protein or polypeptide are in- 
cubated along with the labeled protein or polypeptide. 
The amount of labeled protein or polypeptide bound to 
the cell surface decreases as the amount of competitive 
unlabeled protein or polypeptide increases. As a control, 

40 various amounts of an unlabeled protein or polypeptide 
unrelated to the labeled protein or polypeptide is includ- 
ed in some binding reactions. The amount of labeled 
protein or polypeptide bound to the cell surface does not 
decrease in binding reactions containing increasing 

45 amounts of unrelated unlabeled protein, indicating that 
the protein or polypeptide encoded by the nucleic acid 
binds specifically to the cell surface. 
[0288] As discussed above, human proteins have 
been shown to have a number of important physiological 

50 effects and, consequently, represent a valuable thera- 
peutic resource. The human proteins or polypeptides 
made as described above may be evaluated to deter- 
mine their physiological activities as described below. 

ss 
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EXAMPLE 22 

Assaying the Expressed Proteins or Polypeptides for 
Cytokine. Cell Proliferation or Cell Differentiation 
Activity 

[0289] As discussed above, some human proteins act 
as cytokines or may affect cellular prol'iferation or differ- 
entiation. Many protein factors discovered to date, in- 
cluding all known cytokines, have exhibited activity in 
one or more factor dependent cell proliferation assays, 
and hence the assays serve as a convenient contirma- 
tion of cytokine activity. The activity of a protein or 
polypeptide of the present invention is evidenced by any 
one of a number of routine factor dependent cell prolif- 
eration assays lor cell lines including, without limitation, 
32D, DA2, DAIG, T10. B9, B9/11, BaF3. MC9/G. 
(preB M^), 2E8, RB5, DAI, 123, T1165, HT2, CTLL2. 
TF-1 , Mo7c and CMK. The proteins or polypeptides pre- 
pared as described above nnay be evaluated for their 
ability to regulate T cell or thymocyte proliferation in as- 
says such as those described above or in the following 
references: Current Protocols in Immunology, Ed. by J. 
E. Coligan etaL, Greene Publishing Associates and Wi- 
ley- 1 nterscience; Takai et al. J, Immunol. 137: 
3494-3500, 1986., Bertagnolli et al. J. Immunol. 14S: 
1706-1712, 1990., Bertagnolli et aL/ Cellular Immunol- 
ogy ^ 33:327 -S4^, 1991. Bertagnolli, etal. J. Immunol 
149:3778-3783, 1992; Bowman etal., J. Immunol. 152: 
1756-1761,1994. 

[0290] In addition, numerous assays for cytokine pro- 
duction and/or the proliferation of spleen cells, lymph 
node cells and thymocytes are known. These include 
the techniques disclosed in Current Protocols In Im- 
munology. J.E. Coligan etal. Eds., 1:3.12.1-3.12.14, 
John Wiley and Sons, Toronto. 1 994; and Schreiber, R. 
D. In Current Protocols in Immunology, supra 1 : 
6.8.1-6.8,8. 

[0291] The proteins or polypeptides prepared as de- 
scribed above may also be assayed for the ability to reg- 
ulate the proliferation and differentiation of hematopoi- 
etic or lymphopoietic cells. Many assays for such activity 
are familiar to those skilled In the art, iricluding the as- 
says in the following references: Bottomly etaL, In Cur- 
rent Protocols in Immunology, supra. 1 : 6.3.1-6.3.12,; 
deVries et al, J. Exp. Med. 173:1205-1211, 1991; 
Moreau etal, Afefure 36:690-692. 1988; Greenberger 
etal., Proc. Natl. Acad. Sci. U.S.A. 80:2931-2938, 1983; 
Nordan, R., In Current Protocol in Immunology, supra. 
1 : 6.6.1-6.6.5; Smith et al., Proc. Natl. Acad. Sci. U.S. 
A. 83:1857-1861, 1986; Bennett ef a/in Current Proto- 
cols in Immunology supra 1 : 6.15.1; Ciarletta et al\n 
Current Protocols in Immunology, supra 1 : 6. 1 3. 1 . 
[0292] The proteins or polypeptides prepar d as d - 
scribed abov may also be assayed for their ability to 
regulate T-cell responses to antig ns. Many assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays described in the following referenc- 



es: Chapter 3 {In vitro Assays for Mouse Lymphocyte 
Function). Chapter 6 (Cytokines and Their C llular Re- 
ceptors) and Chapter 7, (Inrvnunologic Studies in Hu- 
nnans) in Current Protocols in Immunology supra; Wein- 

5 berger et ai, Proc. Natl Acad. Sci USA 77:6091-6095. 
1980; Weinberger et al., Eur. J. Immun. 11:405-411, 
1981; Takai et al., J. Immunol. 137:3494-3500. 1986; 
Takai etal., J. Immunol. 140:508-512. 1988. 
[0293] Those proteins or polypeptides which exhibit 

10 cytokine, cell proliferation, or cell differentiation activity 
may then be formulated as pharmaceuticals and used 
to treat clinical conditions in which induction of cell pro- 
liferation or differentiation is beneficial. Attematively, as 
described in more detail below, nucleic acids encoding 

IS these proteins or polypeptides or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 
be introduced into appropriate host cells to increase or 
decrease the expression of the proteins or polypeptides 
as desired. 

EXAMPLE 23 

Assaying the Expressed Proteins or Polypeptides for 
Activity as Immune System Regulators 

[0294] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects as 
immune regulators. For example, the proteins or 
polypeptides may be evaluated for their activity to influ- 

30 ence thymocyte or splenocyte cytotoxicity. Numerous 
assays for such activity are familiar to those skilled in 
the art including the assays described in the following 
references: Chapter 3 {In vitro Assays for Mouse Lym- 
phocyte Function 3.1-3.1 9) and Chapter 7 (Immunologic 

3S Studies in Humans) in Current Protocols in 
Immunology , J.E. Coligan etal. Eds, Greene Publishing 
Associates and Wiley-I nterscience; Herrmann ef al., 
Proc. Natl. Acad. ScL tySy4 78: 2488-2492, 1981 ; Herrm- 
ann et al, J Immunol. 128:1968-1974, 1982; Handa et 

40 ai, J. Immunol. 135:1564-1572, 1985; Takai etal, J. 
Immunol. 137:3494-3500, 1986; Takai etal, J. Immu- 
nol 140:508-512, 1988; Bowman etal, J. Virology 
1992-1998; Bertagnolli et al Call Immunol 133 
327-341, 1991; Brown et al, J. Immunol 153 

45 3079-3092. 1994. 

[0295] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effects on 
T-cell dependent immunoglobulin responses and iso- 
type switching. Numerous assays for such activity ar 

so familiar to those skilled in the art, including the assays 
disclosed in the following references: Maliszewski, J. 
Immunol 144:3028-3033, 1990; Mond etal in Current 
Protocols in Immunology, 1 : 3.8.1-3.8.16, supra. 
. [0296] Th proteins or polyp ptid s prepared as de- 

ss scribed abov may also be evaluated for th ir effect on 
immune effector c lis. including their ffect on Thi cells 
and cytotoxic lymphocytes. Numerous assays for such 
activity are familiar to thos skilled in the art, including 
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the assays disclosed in the following references: Chap- 
ter 3 {in vitro Assays for Mouse Lymphocyte Function 
3.1-3.19) and Chapter 7 (Innmunologic Studies in Hu- 
mans) in Current Protocols in Immunology, supra; Takai 
etaL, J. Immunol 137:3494-3500. 1986; Takai et ai : J. 
Immunol. 140:508-512. 1988; Bertagnolli etaL, J. Im- 
munol 149:3778-3783. 1992. 

[0297] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their effect on 
dendritic cell mediated activation of naive T-cells. Nu- 
nrierous assays for such activity are familiar to those 
skilled in the art, including the assays disclosed in the 
following references: Guery et ai, J. Immunol. 134: 
536-544, 1995; Inaba etaL, J, Exp. Med. 173:549-559, 
1991; Macatonia et al, J. Immunol 154:5071-5079, 
1995; Porgador et alJ. Exp. /Wed 182:255-260. 1995; 
Nair et al., J. Virol 67:4062-4069, 1993; Huang et ai, 
Science 264:961-965, 1994; Macatonia et al J. Exp. 
Ated 169:1255-1264, 1989; Bhardwaj et al. Journal of 
CUnica/ Investigation 94:797-&07, 1994; and Inaba et 
al, J. Exp. A/fed 172:631 -640, 1990. 
[0298] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on the lifetime of lymphocytes. Numerous assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays disclosed in the following references: 
Darzynkiewicz et al. Cytometry 13:795-808, 1992; 
Gorczyca et al. Leukemia 7:659-670, 1993; Gorczyca 
etal. Cancer Res. 53:1945-1951. 1993; Itoh efa/.. Cell 
66:233-243, 1991; Zacharchuk, J. Immunol 145: 
4037-4045. 1990; Zamai etal, Cyfomef/y 14:891 -897. 
1993; Gorczyca etal, Int J. Oncol 1:639-648, 1992. 
[0299] The proteins or polypeptides prepared as de- 
scribed above may also be evaluated for their influence 
on early steps of T-cell commitment and development. 
Numerous assays for such activity are familiar to those 
skilled in the art, including without limitation the assays 
disclosed in the following references: Anticaera/., Blood 
84:111-117. 1994; Fine et al. Cell Immunol 155: 
111-122, 1994; Galy etal, S/ood 85:2770-2778, 1995; 
Toki et al, Proc. Nat. Acad Sci USA 88:7548-7551. 
1991. 

[0300] Those proteins or polypeptides which exhibit 
activity as immune system regulators activity may then 
be formulated as pharmaceuticals and used to treat clin- 
ical conditions in which regulation of immune activity is 
beneficial. For example, the protein or polypeptide may 
be useful in the treatment of various immune deficien- 
cies and disorders (including severe combined immun- 
odeficiency), e.g., in regulating (up or down) growth and 
proliferation of T and/or B lymphocytes, as well as ef- 
fecting the cytolytic activity of NK cells and other cell 
populatk>ns. These immune deficiencies may be genet- 
ic or b caused by viral ( g., HIV) as w 11 as bact rial 
or fungal inf ctions. or may r suit from autoimmun dis- 
ord rs. Mor specifically, inf ctious diseases caused by 
viral, bact rial, fungal or other infection nnay be tr atable 
using the protein or polypeptid including infections by 



HIV, hepatitis viruses, herpesviruses, mycobacteria, 
Leishmania spp., plamodium. and various fungal infec- 
tions such as candidiasis. Of course, in this regard, a 
protein or polypeptide may also be useful where a boost 
5 to the immune system generally may be desirable, i.e., 
in the treatment of cancer. 

[0301] Alternatively, the proteins or polypeptides pre- 
pared as described above may be used in treatment of 
autoimmune disorders including, for example, connec- 

10 tive tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pul- 
monary inflammation, Guillain-Barre syndrome, autoim- 
mune thyroiditis, insulin dependent diabetes mellitis, 
myasthenia gravis, graft-versus-host disease and au- 

15 toimmune inflammatory eye disease. Such a protein or 
polypeptide may also to be useful in the treatment of 
allergic reactions and conditions, such as asthma. (par- 
ticularly allergic asthma) or other respiratory problems. 
Other conditions, in which immune suppression is de- 

20 sired (including, for example, organ transplantation), 
may also be treatable using the protein or polypeptide. 
[0302] Using the proteins or polypeptides of the inven- 
tion it may also be possible to regulate immune respons- 
es either up or down. Down regulation may involve in- 

25 hibiting or blocking an immune response already in 
progress or may involve preventing the induction of an 
immune response. The functions of activated T-cel!s 
may be inhibited by suppressing T cell responses or by 
inducing specific tolerance in T cells, or both. Immuno- 

30 suppression of T cell responses is generally an activ 
non-antigen-specific process which requires continuous 
exposure of the T cells to the suppressive agent. Toler- 
ance, which involves inducing non-responsiveness or 
anergy in T cells, is distinguishable from immunosup- 

35 pression in that it is generally antigen-specific and per- 
sists after the end of exposure to the tolerizing agent. 
Operatbnally, tolerance can be demonstrated by the 
lack of a T cell response upon reexposure to specific 
antigen in the absence of the tolerizing agent. 

40 [0303] Down regulating or preventing one or more an- 
tigen functions (including without limitation B lym- 
phocyte antigen functions, such as, for example, B7 
costimulation), e.g., preventing high level lymphokine 
synthesis by activated T cells, will be useful in situations 

45 of tissue, skin and organ transplantation and in graft- 
versus-host disease (GVHD). For example, blockage of 
T cell function should result in reduced tissue destruc- 
tion in tissue transplantation. Typically, in tissue trans- 
plants, rejection of the transplant is initiated through its 

50 recognition as foreign by T cells, followed by an immune 
reaction that destroys the transplant. The administration 
of a molecule which inhibits or blocks interaction of a 87 
lymphocyte antigen with its natural ligand(s) on immune 
cells (such as a soluble, monomeric form of a p ptide 

55 having 87-2 activity alon or in conjunction with a mon- 
omeric form of a peptid having an activity of anoth r B 
lymphocyte antig n ( g., 87-1. 87-3) or bkx;king anti- 
body), prior to transplantation, can I ad to the binding 
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of th molecule to th natural ligandCs) on the immune 
cells without transmitting the corresponding costimula- 
tory signal Blocking B lymphocyte antigen function in 
this nnatter prevents cytokine synthesis by immune cells, 
such as T cells, and thus acts as an immunosuppres- 
sant. Moreover, the lack of costimulation may also be 
sufficient to anergize the T ceils, thereby inducing toler- 
ance in a subject. Induction of long-term tolerance by B 
lymphocyte antigen -blocking reagents may avoid the 
necessity of repeated administration of these blocking 
reagents. To achieve sufficient immunosuppression or 
tolerance in a subject, it may also be necessary to block 
the function of a combination of B lymphocyte antigens. 
[0304] The efficacy of particular blocking reagents in 
preventing organ transplant rejection or GVHD can be 
assessed using animal models that are predictive of ef- 
ficacy in humans. Examples of appropriate systems 
which can be used include allogeneic cardiac grafts in 
rais and xenogeneic pancreatic islet cell grafts in mice, 
bolh of which have been used to examine the immuno- 
suppressive effects of CTLA4lg fusion proteins in vivo 
as described in Lenschow et ai. Science 257:789-792 
(1992) and Turka etai, Proc. NatL Acad. Sci USA, 89: 
11102-11105 (1992). In addition, murine models of 
GVHD (SCO Paul ed.. Fundamental Immunology, Raven 
Press. New York, 1989, pp. 846-847) can be used to 
determine the effect of blocking B lymphocyte antigen 
function in vivo on the development of that disease. 
[0305] Blocking antigen function may also be thera- 
peutically useful for treating autoimmune diseases. 
Many autoimmune disorders are the result of inappro- 
priate activation of T cells that are reactive against self 
tissue and which promote the production of cytokines 
and autoantibodies involved in the pathology of the dis- 
eases. Preventing the activation of autoreactive T cells 
may reduce or eliminate disease symptoms. Adminis- 
tration of reagents which block costimulation of T cells 
by disrupting receptor/ligand interactions of B lym- 
phocyte antigens can be used to inhibit T cell activation 
and prevent production of autoantibodies or T cell-de- 
rived cytokines which potentially involved in the disease 
process. Additionally, blocking reagents may induce an- 
tigen-specific tolerance of autoreactive T cells which 
could lead to long-term relief from the disease. The ef- 
ficacy of blocking reagents in preventing or alleviating 
autoimmune disorders can be determined using a 
number of well-characterized animal models of human 
autoimmune diseases. Examples include murine exper- 
imental autoimmune encephalitis, systemic lupus eryth- 
matosis in MRL/pr/pr mice or NZB hybrid mice, murine 
autoimmuno collagen arthritis, diabetes mellitus in OD 
mice and BB rats, and murine experimental myasthenia 
gravis (see Paul ed.. Fundamental Immunology, Raven 
Pr ss, N w York. 1989, pp. 840-856). 
[0306] Upr gulation of an antig n function (pref rably 
a B lymphocyte antigen function), as a means of up reg- 
ulating immune responses, nnay also be useful in ther- 
apy. Upregulation of immun responses may involve ei- 



' ther enhancing an xisting immune response or eliciting 
an initial immune response as sKbwn by the folkswing 
examples. For instance, enhancing an immune re- 
sponse through stimulating B lymphocyte antigen func- 
5 tion may be useful in cases of viral infection. In addition, 
systemic viral diseases such as influenza, the comrTX>n 
cold, and encephalitis might be alleviated by the admin- 
istration of stimulatory form of B lymphocyte antigens 
systemically. 

10 [0307] Altematively. antiviral immune responses may 
be enhanced in an infected patient by renrK)ving T cells 
from the patient, costimulating the T cells in vitro with 
viral antigen-pulsed APCs either expressing the pro- 
teins or polypeptides described above or together with 

^5 a stimulatory form of the protein or polypeptide and re- 
introducing the in vitro primed T cells into the patient. 
The infected cells would now be capable of delivering a 
costimulatory signal to T cells in vivo, thereby activating 
the T cells. 

20 [0308] In another application, upregulation or en- 
hancement of antigen function (preferably B lymphocyt 
antigen function) may be useful in the induction of tumor 
immunity. Tumor cells (e.g., sarcoma, melanoma, lym- 
phoma, leukemia, neuroblastoma, carcinoma) trans- 

25 f acted with one of the above -described nucleic acids en- 
coding a protein or polypeptide can be administered to 
a subject to overcome tumor-specific tolerance in th 
subject. If desired, the tumor cell can be transfected to 
express a combination of peptides. For example, tumor 

30 cells obtained from a patient can be transfected ex vivo 
with an expression vector directing the expression of a 
peptide having B7-2-like activity alone, or in conjunction 
with a peptide having B7-1-like activity and/or B7-3-like 
activity. The transfected tumor cells are retumed to the 

35 patient to result in expression of the peptides on the sur- 
face of the transfected cell. Altematively. gene therapy 
techniques can be used to target a tumor cell for trans- 
fection in vivo. 

[0309] The presence of the protein or polypeptide en- 

40 coded by the nucleic acids described above having the 
activity of a B lymphocyte antlgen(s) on the surface of 
the tunnor cell provides the necessary costimulation sig- 
nal to T cells to induce a T cell mediated immune re- 
sponse against the transfected tumor cells. In addition, 

45 tumor cells whrch lack or whrch fail to reexpress suffi- 
cient amounts of MHO class I or MHC class II rrtolecules 
can be transfected with nucleic acids encoding all or a 
portion of (e.g., a cytoplasmic-domain truncated portion) 
of an MHC class I a chain and p2 microglobulin or an 

50 MHC class II a chain and an MHC class II p chain to 
thereby express MHC class I or MHC class II proteins 
on the cell surface, respectively. Expression of the ap- 
propriate MHC class I or class II nrolecules in conjunc- 
tion with a p ptid having th activity of a B lymphocyte 

ss antig . n ( g., B7-1 , B7-2, B7-3) induces a T cell medi- 
al d immune respons against the transf cted tumor 
cell. Optionally, a nucleic acid encoding an antisense 
construct which blocks expr ssion of an MHC class II 
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associated protein, such as the invariant chain, can also 
be cotransfect d with a DNA encoding a protein or 
polypeptide having the activity of a B V^nphocyte anti- 
gen to promote presentation of tumor associated anti- 
gens and induce tumor specific immunity. Thus, the in- 
duction of a T cell mediated immune response in a hu- 
man subject may be sufficient to overcome tumor-spe- 
cific tolerance in the subject. Alternatively, as described 
in more detail below, nucleic acids encoding these im- 
mune system regulator proteins or polypeptides or nu- 
cleic acids regulatinig the expression of such proteins or 
polypeptides may be introduced into appropriate host 
cells to increase or decrease the expression of the pro- 
teins as desired. 

EXAMPLE 24 

Assaying the Expressed Proteins or Polypeptides for 
Henrtatopoiesis Regulating Activity 

[0310] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 
for their hematopoiesis regulating activity. For example, 
the effect of the proteins or polypeptides on embryonic 
stem cell differentiation may be evaluated. Numerous 
assays for such activity are familiar to those skilled in 
the art, including the assays disclosed in the following 
references: Johansson et ai. Cell Biol. 15:141-151, 
1 995; Keller et al, MoL CbII Biol 1 3:473-486, 1 993; Mc- 
Clanahan ef a/., B/ood 81:2903-2915, 1993. 
[0311] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 
for their influence on the lifetime of stem cells and stem 
cell differentiation. Numerous assays for such activity 
are familiar to those skilled in the art, including the as- 
says disclosed in the following references: Freshney, M. 
G. Methylcellulose Colony Forming Assays, in Culture 
of Hematopoietic Cells . R.I. Freshney, et al. Eds. pp. 
265-268, Wiley-Liss, Inc.. New York, NY. 1994; Hiraya- 
ma et al., Proc, Nati Acad. Sa\ USA 89:5907-5911. 
1992; McNiece. I.K. and Briddell, R.A. Primitive Hemat- 
opoietic Cotony Forming Cells with High Proliferative 
Potential, in Culture of Hematopoietic Cells. R.I. Fresh- 
ney, et al. eds. Vol pp. 23-39, Wiley-Liss, Ire., New York, 
NY, 1994; Neben et al.. Experimental Hematology 22: 
353-359, 1994; Ploemacher, R.E. Cobblestone Area 
- Forming Cell Assay, In Culture of Hematopoietic Cells. 
R.I. Freshney, etal. Eds. pp. 1-21, Wiley-Liss, Inc., New 
York, NY. 1994; Spooncer, E., Dexter, M. and Allen, T. 
Long Term Bone Marrow Cultures in the Presence of 
Stromal Cells, in Culture of Hematopoietic Cells . R.I. 
Freshney, etal. Eds. pp. 163-179, Wiley-Liss, Inc.. New 
York, NY 1994; and Sutherland, H.J. Long Term Culture 
Initiating Cell Assay, in Culture of H matopoietk? C lis . 
R.I. Fr shney, etal. Eds, pp 139-162, Wiley-Liss, Inc.. 
N w York, NY. 1994. 

[0312] Those proteins or polypeptides which xhibit 
hematopoiesis regulatory activity may th n be formulat- 



ed as pharmaceuticals and used to treat clinical condi- 
tions in which r gulation of hematopoeisis is beneficial. 
For example, a protein or polypeptide of the present in- 
ventbn may be useful in regulation of hematopoiesis 

5 and, consequently, in the treatment of myeloid or lym- 
phoid cell deficiencies. Even marginal biological activity 
in support of colony forming cells or of factor-dependent 
cell lines indicates involvement in regulating hematopoi- 
esis, e.g. in supporting the growth and proliferation of 

10 erythrotd progenitor cells alone or in combination with 
other cytokines, thereby indicating utility, for example, 
in treating various anemias or for use in conjunction with 
irradiation/chemotherapy to stimulate the productk>n of 
erythrold precursors and/or erythroid cells; in supporting 

IS the growth and proliferation of myeloid cells such as 
granulocytes and monocytes/macrophages (i.e.. tradi- 
tional CSF activity) useful, for example, in conjunction 
with chemotherapy to prevent or treat consequent my- 
elo-suppression; in supporting the growth and prolrfer- 

20 ation of megakaryocytes and consequently of platelets 
thereby allowing prevention or treatment of various 
platelet disorders such as thrombocytopenia, and gen- 
erally for use in place of or complimentary to platelet 
transfusions; and/or in supporting the growth and prolif- 

2S oration of hematopoietic stem cells which are capable 
of maturing to any and all of the above-mentioned he- 
matopoietic cells and therefore find therapeutic utility in 
various stem cell disorders (such as those usually treat- 
ed with transplantion. including, without limitation, 

30 aplastic anemia and paroxysmal nocturnal hemoglob- 
inuria), as well as in repopulating the stem cell compart- 
ment post irradiation/chemotherapy, either in-vrvo or ex- 
vivo (i.e., in conjunction with bone marrow transplanta- 
tion or with peripheral progenitor cell transplantation 

OS (horT>ologous or heterologous)) as nonmal cells or ge- 
netically manipulated for gene therapy. Alternatively, as 
described in more detail below, nucleic acids encoding 
these proteins or polypeptides or nucleic acids regulat- 
ing the expression of these proteins or polypeptides may 

40 be introduced into appropriate host cells to increase or 
decrease the expression of the proteins as desired. 

EXAMPLE 25 

45 Assaying the Expressed Proteins or Polypeptides for 
Regulation of Tissue Growth 

[0313] The proteins or polypeptides encoded by the 
nucleic acids described above may also be evaluated 

so for their effect on tissue growth. Numerous assays for 
such activity are familiar to those skilled in the art, in- 
cluding the assays disclosed in International Patent 
Publication No. WO95/16035, International Patent Pub- 
lication No. WO95/05846 and Int mational Patent Pub- 

55 lication No. WO91/07491 , 

[0314] Assays for wound healing activity include, 
without limitation, thos described in: Wint r. Epidermal 
Wound Healing, pps. 71^112 (Maibach, HI and Rovee, 
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DT. eds ). Year Book Medical Publishers, Inc., Chicago, 
as modified by Eaglstein and Mertz. J. Invest Dernnatol 
71:382-84(1978). 

[0315] Those proteins or polypeptides \whtch are in- 
volved in the regulation o1 tissue growth may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulation of tissue growth is bene- 
ficial. For example, a protein or polypeptide may have 
utility in compositions used for bone, cartilage, tendon, 
ligament and/or nerve tissue growth or regeneration, as 
well as for wound healing and tissue repair and replace- 
ment, and in the treatment of burns, incisions and ulcers. 
[0316] A protein or polypeptide encoded by the nucle- 
ic acids described above which induces cartilage and/ 
or bone growth in circumstances where bone is not nor- 
mally formed, has application in the healing of bone frac- 
tures and cartilage damage or defects in humans and 
other animals. Such a preparation employing a protein 
or polypeptide of the invention may have prophylactic 
use in closed as well as open fracture reduction and also 
in the improved fixation of artificial joints. De novo bone 
synthesis induced by an osteogenic agent contributes 
to the repair of congenital, trauma induced, or oncologic 
resection induced craniofacial defects, and also is use- 
ful in cosmetic plastic surgery. 

[0317] A protein or polypeptide of this invention may 
also be used in the treatment of periodontal disease, 
and in other tooth repair processes. Such agents may 
provide an environment to attract bone-forming cells, 
stimulate growth of bone-forming cells or induce differ- 
entiation of progenitors of bone-forming cells. A protein 
of the invention may also be useful in the treatment of 
osteoporosis or osteoarthritis, such as through stimula- 
tion of bone and/or cartilage repair or by blocking inflam- 
mation or processes of tissue destruction (collagenase 
activity, osteoclast activity, etc.) mediated by inflamma- 
tory processes. 

[031 8] Another category of tissue regeneration activ- 
ity that may be attributable to the proteins or polypep- 
tides encoded by the nucleic acids described above is 
tendon/ligament formation A protein or polypeptide en- 
coded by the nucleic acids described above, which in- 
duces tendon/ligament-like tissue or other tissue forma- 
tion in circumstances where such tissue is not normally 
formed, has application in the healing of tendon or liga- 
ment tears, deformities and other tendon or ligament de- 
fects in humans and other animals. Such a preparation 
employing a lendon/ligament-like tissue inducing pro- 
tein may have prophylactic use in preventing damage 
to tendon or ligament tissue, as well as use in the im- 
proved fixation of tendon or ligament to bone or other 
tissues, and in repairing defects to tendon or ligament 
tissue. De novo tendon/ligament- 1 ike tissue formation 
indue d by a prot in or polypeptid of the pr s nt in- 
vention contributes to th repair of tendon or ligaments 
defects of congenital, traunr>atic or oth r origin and is 
also useful in cosmetic plastic surgery for attachment or 
repair of t ndons or ligaments. The proteins or polypep- 



tides of th present invention may provide an nviron- 
ment to attract tendon- or ligament-forming cells, stim- 
ulate growth of tendon- or ligament-forming cells, induce 
differentiation of progenitors of tendon- or ligament- 

5 forming cells, or induce growth of tendon/ligament cells 
or progenitors ex vivo for return in vivo to effect tissue 
repair. The proteins or polypeptides of the invention may 
also be useful in the treatment of tendinitis, carpal tunnel 
syndrome and other tendon or ligament defects. The 

10 therapeutic compositiais may also include an appropri- 
ate matrix and/or sequestering agent as a carrier as is 
well known in the art. 

[031 9] The proteins or polypeptides of the present in- 
vention may also be useful for proliferation of neural 
15 cells and for regeneration of nerve and brain tissue, i. 
e., for the treatment of central and peripheral nervous 
system diseases and neuropathies, as well as mechan- 
ical and traumatic disorders, which Involve degenera- 
tion, death or trauma to neural cells or nerve, tissue. 
20 More specifically, a protein or polypeptide may be used 
in the treatment of diseases of the peripheral nervous 
system, such as peripheral nerve injuries, peripheral 
neuropathy and localized neuropathies, and central 
nervous system diseases, such as Alzheimer's, Parkin- 
gs son's disease, Huntington's disease, amyotrophic later- 
al sclerosis, and Shy-Drager syndrome. Further condi- 
tions which may be treated in accordance with the 
present invention include mechanical and traumatic dis- 
orders, such as spinal cord disorders, head traurra and 
30 cerebrovascular diseases such as stroke. Peripheral 
neuropathies resulting from chemotherapy or other 
medical therapies may also be treatable using a protein 
or polypeptide of the invention. 

[0320] Proteins or polypeptides of the invention may 

55 also be useful to promote better or faster closure of non- 
healing wounds, including without limitation pressure ul- 
cers, ubers associated with vascular insufficiency, sur- 
gical and traumatic wounds, and the like. 
[0321] It is expected that a protein or polypeptide of 

40 the present invention may also exhibit activity for gen- 
eration or regeneration of other tissues, such as organs 
(including, for example, pancreas, liver, intestine, kid- 
ney, skin, endothelium) muscle (smooth, skeletal or car- 
diac) and vascular (including vascular endothelium) tis- 

4S sue, or tor promoting the growth of cells comprising such 
tissues. Part of the desired effects may be by inhibition 
or modulation of fibrotic scarring to allow normal tissu 
to generate. A protein or polypeptide of the invention 
may also exhibit angiogenic activity. 

so [0322] A protein or polypeptide of the present inven- 
tion may also be useful for gut protection or regeneration 
and treatment of lung or liver fibrosis, reperfusion injury 
in various tissues, and conditions resulting from system- 
ic cytokin damage. 

55 [0323] A protein or polyp ptide of the pr sent inv n- 
tion may also be useful for promoting or inhibiting differ- 
entiation of tissues d scribed al>ove from precursor tis- 
sues or cells; or for inhibiting the growth of tissues de- 
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scribed above. 

[0324] Alternatively, as describ d In more detail be- 
low, nucleic acids encoding tissue growth regulating ac- 
tivity proteins or polypeptides or nucleic acids regulating 
the expression of such proteins or polypeptides may be 
introduced into appropriate host cells to increase or de- 
crease the expression of the proteins as desired. 

EXAMPLE 26 

■Assaying the Expressed Proteins or Polypeptides for 
Regulation of Reproductive Hormones 

[0325] The proteins or polypeptides of the present in- 
vention may also be evaluated for their ability to regulate 
- reproductive hormones, such as follicle stimulating hor- 
mone. Numerous assays for such activity are familiar to 
nhose skilled in the art, including the assays disclosed 
in the following references: V&le etaL, Endocrinol. 91: 
562-572, 1972; Ung ef a/.. Nature 32r779-7&2, 1986; 
Vale etaL, Nature Z2V77Q-779, 1986; Mason etai. Na- 
ture Z^eiSSS-eeS, 1985; Forage ef a/., Proc. Natl. Acad. 
Sci. i>S>\ 83:3091 -3095, 1986 Chapter 6.12 in Current 
Protocols in Immunology, J.E. Coligan et al. Eds. 
Greene Publishing Associates and Wiley-lntersciece ; 
Taub otaL J. Clin. Invest. 95:1370-1376, 1995; Lind et 
al. >^IP/W/S 103:140-146, 1995; Muller et al. Eur J. Im- 
munol. 25:1744-1748; Gruber et al. J. Immunol. 152: 
5860-5867, 1994; Johnston et ai, J Immunol. 153: 
1762-1768, 1994. 

[0326] Those proteins or polypeptides which exhibit 
activity as reproductive hormones or regulators of cell 
movement may then be formulated as pharmaceuticals 
and used to treat clinical conditions in which regulation 
of reproductive hormones are beneficial. For example, 
a protein or polypeptide may exhibit activin- or inhibin- 
related activities. Inhibins are characterized by their 
ability to inhibit the release ot follicle stimulating hor- 
"mone (FSH), while activins are characterized by their 
ability to stimulate the release of FSH. Thus, a protein 
or polypeptide of the present invention, alone or in het- 
^ erodimers with a member of the inhibin a family, may be 
"useful as a contraceptive based on the ability of inhibins 
to decrease fertility in female mammals and decrease 
spermatogenesis in male mammals. Administration of 
sufficient amounts of other inhibins can induce infertility 
In these mammals Alternatively, the protein or polypep- 
tide of the invention, as a homodimer or as a heterodim- 
er with other protein subunits of the inhibin-B group, may 
be useful as a fertility inducing therapeutic, based upon 
the ability of activin molecules in stimulating FSH re- 
lease from cells of the anterior pituitary. See, for exanrv 
pie, United States Patent 4,798.885. A protein or 
polyp ptide of the inv ntion may also b us ful for ad- 
vancement of the onset of f rtility in sexually immature 
mammals, so as to incr as th lit time reproductive 
performance of domestic animals such as cows, sheep 
and pigs. 
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[0327] Afternatively, as d scribed in more detail be- 
low, nucleic acids encoding reproductive hormone reg- 
ulating activity proteins or polypeptides or nucleic acids 
regulating the expression of such proteins or polypep- 
5 tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 27 

Assaying the Expressed Proteins or Polypeptides For 
Chemotactic/Chemokinetic Activity 

[0328] The proteins or polypeptides of the present in- 

is vention may also be evaluated for chemotactic/chem- 
okinetic activity. For example, a protein or polypeptide 
of the present invention may have chemotactic or chem- 
okinetic activity (e.g., act as a chemokine) for mamma- 
lian cells, including, for example, monocytes, fibrob- 

20 lasts, neutrophils, T-cells, mast cells, eosinophils, epi- 
thelial and/or endothelial cells. Chemotactic and chem- 
okinetic proteins or polypeptides can be used to mobi- 
lize or attract a desired cell population to a desired site 
of action. Chemotactic or chemokinetic proteins or 

2S polypeptides provide particular advantages in treatment 
of wounds and other trauma to tissues, as well as in 
treatment of localized infections. For example, attraction 
of lymphocytes, monocytes or neutrophils to tumors or 
sites of infection may result in improved immune re- 

30 sponses against the tumor or infecting agent. 

[0329] A protein or polypeptide has chemotactic ac- 
- tivity for a particular cell population if it can stimulate, 

- - directly or indirectly, the directed orientation or, move- 
ment of such cell population. Preferably, the protein or 

35 polypeptide has the ability to directly stimulate directed 
movement of cells. Whether a particular protein or 
polypeptide has chemotactic activity for a population of 
cells can be readily determined by employing such pro- 
tein or polypeptide in any known assay for cell chemo- 

40 taxis. 

[0330] The activity of a protein or polypeptide of the 
invention may, among other means, be measured by the 
following methods: 

[0331] Assays for chemotactic activity (which will 
45 identify proteins or polypeptides that induce or prevent 
chemotaxis) consist of assays that measure the ability 
of a protein or polypeptide to induce the migration of 
cells across a membrane as well as the ability of a pro- 
tein or polypeptide to induce the adhesion of one cell 
so population to another cell population. Suitable assays 
for movement and adhesion include, without limitation, 
those described in: Current Protocols in Immunology, 
Ed by J.E- Coligan, A.M. Kruisbeek, D H, Margulies, E. 
M. Shevach, W. Strober, Pub. Gr en Publishing Asso- 
ss ciates and Wiley-lntersci nee, Chapter 5.12: 
6.12.1-6.12.28; Taub et al. J. Clin. Invest 95: 
1370-1376, 1995; Lind ef a/. APA/f/S 103: 140-146, 1995; 
Mueller et al., Eur. J. Immunol. 25:1744-1748; Gruber 
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BtaL J. Immunol. 152:5860-5867. 1994; Johnston etal 
J. fmmunol., 153:1762-1768. 1994. 

EXAMPLE 28 

Assaying the Expressed Proteins or Polypeptides for 
Regulation of Blood Clotting 

[0332] The proteins or polypeptides of the present in- 
vention may also be evaluated for their effects on blood 
clotting. Numerous assays for such activity are familiar 
to those skilled in the art, including the assays disclosed 
in the following references: Linet ef aA, J. Clin. Pharma- 
col, 26:131-140, 1986; Burdick etal. Thrombosis Res. 
45:413-419, 1987; Humphrey ef a/.. Fibrinolysis S:7^ -79 
(1991); Schaub, Prostaglandins 3B:A67-474, 1988. 
[0333] Those proteins or polypeptides which are in- 
volved in the regulation of blood clotting may then be 
formulated as pharmaceuticals and used to treat clinical 
conditions in which regulatiori of blood clotting is bene- 
ficlaL For example, a protein or polypeptide ot the inven- 
tion may also exhibit hemostatic or thrombolytic activity. 
As a result, such a protein or polypeptide is expected to 
be useful in treatment of various coagulations disorders 
(including hereditary disorders, such as hemophilias) or 
to enhance coagulation and other henrrc>static events in 
treating wounds resulting from trauma, surgery or other 
causes. A protein or polypeptide of the invention may 
also be useful for dissolving or inhibiting fornnation of 
thromboses and for treatment and prevention of condi- 
tions resulting therefrom (such as infarction of cardiac 
and central nervous system vessels (e.g.. stroke)). Al- 
ternatively, as described in more detail below, nucleic 
acids encoding blood clotting activity proteins or 
polypeptides or nucleic acids regulating the expression 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
pression of the proteins or polypeptides as desired. 

EXAMPLE 29 

Assaying the Expressed Proteins or Polypeptides for 
Involvement in Receptor/Ligand Interactions 

[0334] The proteins or polypeptides of the present in- 
vention may also be evaluated for their involvement in 
receptor/ligand interactions. Numerous assays for such 
involvement are familiar to those skilled in the art, in- 
cluding the assays disclosed in the following references: 
Chapter 7. 7.28.1-7.28.22) in Current Protocols in Im- 
munology, J.E. Coligan et ah Eds. Greene Publishing 
Associates and Wiley-lnterscience; Takai et ai, Proc. 
Natl Acad. Sci. USA 84:6864-6868, 1987; Bierer ef a/., 
J. Exp. Med. 168:1145-1156, 1988; Ros nst in ef aA, J. 
Exp. Med. 169:149-160. 1989; Stoltenborg ef a/., J. Im- 
munol. Methods 175:59-68, 1994; Stitt et al. Cell eO: 
661-670. 1995; Gyuris ef a/., Ce// 75:791 -803. 1993. 
[0335] For example, the proteins or polypeptides of 



th present invention may also d monstrat activity as 
receptors, receptor ligands or inhibitors or agonists of 
receptor/ligand interactions. Examples of such recep- 
tors and ligands include, without limitation, cytokine re- 

5 ceptors and their li^nds, receptor kinases and their lig- 
ands, receptor phosphatases and their ligands, recep- 
tors involved in cell-cell interactions and their ligands 
(including without limitation, cellular adhesion mole- 
cules (such as selectins. integrins and their ligands) and 

10 receptor/ligand pairs involved in antigen presentatbn. 
antigen recognition and development of cellular and hu- 
moral immune responses). Receptors and ligands are 
also useful for screening of potential peptide or small 
molecule inhibitors of the relevant receptor/ligand inter- 

^5 action. A protein or polypeptide of the present invention 
(including, without limitation, fragments of receptors and 
ligands) maybe useful as inhibitors of receptor/ligand in- 
teractions. Alternatively, as described in more detail be- 
low, nucleic acids encoding proteins or polypeptides in- 

20 volved in receptor/ligand interactions or nucleic acids 
regulating the expression of such proteins or polyp p- 
tides may be introduced into appropriate host cells to 
increase or decrease the expression of the proteins or 
polypeptides as desired. 

EXAMPLE 30 

Assaying the Proteins or Polypeptides for Anti- 
Inflammatory Activity 

30 

[0336] The proteins or polypeptides of the present in- 
vention may also be evaluated for anti-inflammatory ac- 
tivity. The anti-inflammatory activity may be achieved by 
providing a stimulus to cells involved in the inflammatory 

35 response, by inhibiting or promoting cell-cell interac- 
tions (such as, for example, cell adhesion), by inhibiting 
or promoting chenrrataxis of cells involved in the inflam- 
matory process, inhibiting or promoting cell extravasa- 
tion, or by stimulating or suppressing production of other 

40 factors which more directly inhibit or promote an inflam- 
matory response. Proteins or polypeptides exhibiting 
such activities can be used to treat inflammatory condi- 
tions including chronic or acute conditions, including 
without limitation inflammation associated with infection 

4S (such as septic shock, sepsis or systemic iriflammatory 
response syndrome), ischemiareperfusioninury, endo- 
toxin lethality, arthritis, complement-mediated hypera- 
cute rejection, nephritis, cytokine- or chemokine-in- 
duced lung injury, inflammatory bowel disease, Crohn*s 

so disease or resulting from over production of cytokines 
such as TNF or IL-1 . Proteins or polypeptides of the in- 
vention may also be useful to treat anaphylaxis and hy- 
persensitivity to an antigenic substance or material- Al- 
temativ ly. as described in mor detail b low, nucleic 

55 acids encoding anti-inflammatory activity proteins or 
polyp ptides or nuci ic acids r gu latin g the expression 
of such proteins or polypeptides may be introduced into 
appropriate host cells to increase or decrease the ex- 
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pression of the proteins or polypeptides as desired. 
EXAMPLE 31 

Assaying the Expressed Proteins or Polypeptides for 
Tumor Inhibition Activity 

'"[0337] The proteins or polypeptides of the present in- 
vention may also be evaluated for tumor inhibition ac- 
tivity. In addition to the activities described above for im- 
^munological treatment or prevention of tumors, a protein 
or polypeptide of the invention may exhibit other anti- 
tumor activities. A protein or polypeptide may inhibit tu- 
mor growth directly or indirectly (such as, for example, 
via ADCC). A protein or polypeptide may exhibit its tu- 
'^^mor inhibitory activity by acting on tumor tissue or tumor 
precursor tissue, by inhibiting formation of tissues nec- 
" essary to support tumor growth (such as, for example, 
^^'by inhibiting angiogenesis), by causing production of 
other factors, agents or cell types which Inhibit tumor 
growth, or by suppressing, eliminating or inhibiting fac- 
tors, agents or cell types which promote tumor growth. . 
Altematively, 3S described in more detail below, nucleic 
acids encoding proteins or polypeptides with tumor In- 
hibition activity or nucleic acids regulating the expres- 
sion of such proteins or polypeptides may be introduced 
into appropriate host cells to increase or decrease the 
expression of the proteins or polypeptides as desired. 
[0338] A protein or polypeptide of the invention may 
also exhibit one or more of the following additional ac- 
tivities or effects: inhibiting the growth, Infection or func- 
tion of. or killing, infectious agents, Including, without 
limitation, bacteria, viruses, fungi and other parasites; 
effecting (suppressing or enhancing) bodily character- 
istics, Including, without limitation, height, weight, hair 
color eye color, skin, fat to lean ratio or other tissue pig- 
mentation, or organ or body part size or shape (such as, 
for example, breast augmentation or diminution, change 
in bone form or shape); effecting biorhythms or circadian 
cycles or rhythms; effecting the fertility of male orfemale 
subjects; effecting the metabolism, catabolism, anabo- 
lism, processing, utilization, storage or elimination of di- 
^ etary fat, lipid, protein, carbohydrate, vitamins, minerals, 
cofactors or other nutritional factors or component(s); 
effecting behavioral characteristics, including, without 
limitation, appetite, libido, stress, cognition (including 
'cognitive disorders), depression (Including depressive 
disorders) and violent behaviors; providing analgesic ef- 
fects or other pain reducing effects; promoting differen- 
* tiation and growth of embryonic stem cells In lineages 
other than hematopoietic lineages; hormonal or endo- 
crine activity; in the case of enzymes, correcting defi- 
ciencies of the enzyme and treating deficiency-related 
diseas s; tr atment of hyperprollferativ disorders 
(such as, for example, psoriasis); immunoglobulin-like 
activity (such as, for exampi , th ability to bind antigens 
or complement); and the ability to act as an antigen in 
a vaccine composition to raise an Immun response 
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against such prot In or another rrjaterial or entity which 
is cross-reactive with such protein. Alt rnatively, as de- 
scribed in more detail below, nucleic acids encoding pro- 
teins or polypeptides involved in any of th above men- 
5 tioned activities or nucleic acids regulating the expres- 
sion of such proteins may be Introduced into appropriate 
host cells to increase or decrease the expression of the 
proteins or polypeptides as desired. 

10 EXAMPLE 32 

Identification of Proteins or Polypeptides which Interact 
with Proteins or Polypeptides of the Present Invention 

75 [0339] Proteins or polypeptides which interact with 
the proteins or polypeptides of the present invention, 
such as receptor proteins, may be identified using two 
hybrid systems such as the Matchmaker Two Hybrid 
System 2 (Catalog No. K1 604-1, Clontech). As de- 

20 scribed in the manual accompanying the kit, nucleic ac- 
ids encoding the proteins or polypeptides of the present 
invention, are inserted into an expression vector such 
that they are in frame with DN A encoding the DNA bind- 
ing domain of the yeast transcriptional activator GAL4. 

2S cDNAs in a cDNA library which encode proteins or 
polypeptides which might Interact with the proteins or 
polypeptides of the present invention are inserted Into 
a second expression vector such that they are in frame 
with DNA encoding the activation domain of GAL4. The 

30 two expression plasmlds are transformed Into yeast and 
the yeast are plated on selection medium which selects 
for expression of selectable markers on each of the ex- 
pression vectors as well as GALA dependent expres- 
sion of the HISS gene. Transformants capable of grow- 

35 ing on medium lacking histidtne are screened for GAL4 
dependent lacZ expression. Those cells which are pos- 
itive in both the histidine selection and the lacZ assay 
contain plasmlds encoding proteins or polypeptides 
which interact with the proteins or polypeptides of th 

40 present Invention. 

[0340] Alternatively, the system described in Lustig et 
ai. Methods in Enzymology 283: 83-99 (1997), may be 
used for Identifying molecules which interact with th 
proteins or polypeptides of the present invention. In 

45 such systems, in vitro transcription reactions are per- 
formed on a pool of vectors containing nucleic acid in- 
serts which encode the proteins or polypeptides of the 
present Invention. The nucleic acid inserts are cloned 
downstream of a promoter which drives in vitro fran- 
co scription. The resulting pools of mRNAs are introduced 
into Xenopus laevis oocytes. The oocytes are then as- 
sayed for a desired activity. 

[0341] Alternatively, the pooled in vitro transcription 
products produced as described at>ove may be translat- 
es ed in vitro. The pooled in vitro translation products can 
be assayed for a desir d activity or for interaction with 
a known protein or polypeptid . 

[0342] Proteins, polypeptid s or other molecules in- 
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t racting with prot ins or polypeptides of th pres nt in- 
vention can be found by a variety of additional tech- 
niques. In one method, affinity columns containing the 
protein or polypeptide of the present invention can be 
constructed. In some versions, of this method the affinity 
column contains chimeric proteins in which the protein 
or polypeptide of the present invention is fused to glu- 
tathione S-transferase. A mixture of cellular proteins or 
pool of expressed proteins as described above and is 
applied to the affinity column. Molecules interacting with 
the protein or polypeptide attached to the column can 
then be isolated and analyzed on 2-D electrophoresis 
gel as described in Ramunsen etaL Electrophores^, 1 8. 
588-598 (1 997). Alternatively, the molecules retained on 
the affinity column can be purified by electrophoresis 
based methods and sequenced. The same method can 
be used to isolate antibodies, to screen phage display 
products, or to screen phage display human antibodies. 
[0343] Molecules interacting with the proteins or 
polypeptides of the present invention can also be 
screened by using an Optical Biosensor as described in 
Edwards & Lealherbarrow, Analytical Biochemistry, 
246, 1-6 (1997). The main advantage of the method is 
that it allows the determination of the association rate 
between the protein or polypeptide and other interacting 
molecules. Thus, it is possible to specifically select in- 
teracting molecules with a high or low assocation rate. 
Typically a target molecule is linked to the sensor sur- 
face (through a carboxymethl dextran matrix) and a 
sample of test molecules is placed in contact with the 
target molecules. The binding of a test molecule to the 
target molecule causes a change in the refractive index 
and/ or thickness. This change is detected by the Bio- 
sensor provided it occurs in the evanescent field (which 
extend a few hundred nanometers from the sensor sur- 
face). In these screening assays, the target molecule 
can be one of the proteins or polypeptides of the present 
invention and the test sample can be a collection of pro- 
teins, polypeptides or other molecules extracted from 
tissues or cells, a pool of expressed proteins, combina- 
torial peptide and/ or chemical libraries, or phage dis- 
played peptides. The tissues or cells from which the test 
molecules aris extracted can originate from any species. 
[0344] In other methods, a target protein or polypep- 
tide is immobilized and the test population is a collection 
of unique proteins or polypeptides of the present inven- 
tion. 

[0345] To study the interaction of the proteins or 
polypeptides of the present invention with drugs, the 
microdialysis coupled to HPLC method described by 
Wang etai, Chrdmatographia, 44, 205-208( 1 997) or the 
affinity capillary electrophoresis method described by 
Busch etai, J. Chromatogn 777:311-328 (1997) can be 
us d. 

[0346] The system described in U.S. Pat nt No. 
5,654, 1 50 may also b used to identify molecul s which 
int ract with the proteins or polypeptides of the pr sent 
invention. In this syst m, pools of nucleic acids encod- 



ing the prot ins or polypeptides of the present invention 
are transcribed and translated in vitro and the react k>n 
products are assayed for interaction with a known 
polypeptide or antibody. 

s [0347] It will be appreciated by those skilled in the art 
that the proteins or polypeptides of the present invention 
may be assayed for numerous activities in addition to 
those specifically enumerated above. For example, the 
expressed proteins or polypeptides may be evaluated 

10 for applications involving control and regulation of in- 
flammation, tumor proliferation or metastasis, infection, 
or other clinical conditions. In addition, the proteins or 
polypeptides may be useful as nutritional agents or cos- 
metic agents. 

IS [0348] The proteins or polypeptides of the present in- 
vention may be used to generate antibodies capable ot 
specifically binding to the proteins or polypeptides of the 
present invention. The antibodies may be monoclonal 
antibodies or polyclonal antibodies. As used herein, "an- 

20 libody" refers to a polypeptide or group of polypeptides 
which are comprised of at least one binding domain, 
where a binding domain is formed from the folding of 
variable domains of an antibody molecule to form thr e- 
dimensional binding spaces with an intemal surface 

2S shape and charge distribution complementary to the 
features of an antigenic determinant of an antig n., 
which allows an immunok>gical reaction with the anti- 
gen. Antibodies include recombinant proteins compris- 
ing the binding domains, as wells as fragments, includ- 

30 ing Fab, Fab', F(ab)2. and F(ab')2 fragments. 

[0349] As used herein, an "antigenic determinant* is 
- the portion of an antigen molecule, that determines the 

- - specificity of the antigen-antibody reaction. An "epitope" 
refers to an antigenic determinant of a polypeptide. An 

35 epitope can conrprise as few as 3 amino acids in a spa- 
tial conformation which is unique to the epitope. Gener- 
ally an epitope consists of at least 6 such amino acids, 
and more usually at least 8-10 such amino acids Meth- 
ods for determining the amino acids which make up an 

40 epitope include x-ray crystallography, 2-dimensional nu- 
clear magnetic resonance, and epitope mapping e.g. 
the Pepscan method described by H. Mario Geysen et 
al. 1984. Proc. Natl. Acad. Sci. U.S.A. 81:3998-4002; 
PCT Publication No. WO 84/03564; and PCT Publrca- 

45 tion No. WO 84/03506. . 

[0350] In some emfcKxJiments. the antibodies may be 
capable of specifically binding to a protein or polypep- 
tide encoded by EST-related nucleic acids, fragments 
of EST-related nucleic acids, positional segments of 

so EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. In some embod- 
iments, the antibody may be capable of binding an an- 
tigenic determinant or an epitope in a protein or polypep- 
tide encod d by EST-r lat d nucleic acids, fragm nts 

55 of EST-r lated nucleic acids, positional s gments of 
EST-related nuci ic acids or fragments of positional 
segments of EST-related nucleic acids. 
[0351] In other mbodiments. the antibodies may be 
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capabi of specifically binding to an EST-related 
polyp ptide, fragm nt of an EST-related polypeptide, 
positional segment of an EST-related polypeptide or 
fragment of a positional segment of an EST-related 
polypeptide. In some embodiments, the antibody may 
be capable of binding an antigenic determinant or an 
epitope in an EST-related polypeptide, fragment of an 
EST-related polypeptide, positional segment of an EST- 
related polypeptide or fragment of a positional segment 
of an EST-related polypeptide. 

[0352] In the case of secreted proteins, the antibodies 
may be capable of binding a full-length protein encoded 
by a nucleic acid of the present invention, a mature pro- 
tein (i.e. the protein generated by cleavage of the signal 
peptide) encoded by a nucleic acid of the present inven- 
tion, or a signal peptide encoded by a nucleic acid of the 
present invention. 

EXAMPLE 33 

Production of an Antibody to a Human Polypeptide or 
Protein 

[0353] The above described EST-related nucleic ac- 
ids, fragments of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or nu- 
cleic acids encoding EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides are operably 
linked to promoters and introduced into cells as de- 
scribed above. • 

[0354] In the case of secreted proteins, nucleic acids 
encoding the full protein (i.e. the mature protein and the 
signal peptide), nucleic acids encoding the mature pro- 
tein (i.e. the protein generated by cleavage of the signal 
peptide), or nucleic acids encoding the signal peptide 
are operably linked to promoters and introduced Into 
cells as described above. 

[0355] The encoded proteins or polypeptides are then 
substantially purified or isolated as described above. 
The concentration of protein in the final preparation is 
adjusted, for example, by concentration on an Amicon 
filter device, to the level of a few ^ig/ml. Monoclonal or 
polyclonal antibody to the protein or polypeptide can 
then be prepared as follows: 

1 . Monoclonal Antibody Productkxi by Hybridoma 
Fusion 

[0356] Monoclonal antibody to epitopes of any of the 
proteins or polypeptides identified and isolated as de- 
scribed can be prepar d from murin hybridomas ac- 
cording to the classical method of Kohl r, and Milst in, 
Nature 256:495 (1975) or derivative methods th reof. 
Bri fly, a mouse is rep titively inoculated with a few mi- 
crograms of the s lected protein or p ptides derived 



th ref rom over a period of a few weeks. The nrrause is 
th n sacrificed, and th antibody'^producing cells of th 
spleen isolated. The spleen cells are fused by means of 
polyethylene glycol with mouse myeloma cells, and the 

5 excess unf used cells destroyed by growth of the system 
on selective media comprising aminopterin (HAT me- 
dia). The successfully fused ceils are diluted and aliq- 
uots of the dilution placed in wells of a microtiter plate 
where growth of the culture is continued. Antibody-pro- 

10 ducing clones are identified by detection of antibody in 
the supernatant fluid of the wells by immunoassay pro- 
cedures, such as Elisa, as originally described by 
Engvall, Meth. EnzymoL 70:419 (1980). Selected posi- 
tive clones can be expanded and their monoclonal an- 

15 tibody product han/ested for use. Detailed procedures 
for monoclonal antibody production are described in 
Davis, L. et al. in Basic Methods in Molecular Biology 
Elsevier, New York. Section 21-2. 

20 2. Polyclonal Antibody Production by immunization 

[0357] Polyclonal antiserum containing antibodies to 
heterogenous epitopes of a single protein or polypeptide 
can be prepared by immunizing suitable animals with 

25 the expressed protein or peptides derived therefrom, 
which can be unmodified or modified to enhance immu- 
nogenicity. Effective polyclonal antibody production is 
affected by many factors related both to the antigen and 
the host species For example, small molecules tend to 

30 be less immunogenic than others and may require th 
use of carriers and adjuvant. Also, host animals re- 
sponse vary depending on site of inoculations and dos- 
es, with both inadequate or excessive doses of antigen 
resulting in low titer antisera. Small doses (ng level) of 

35 antigen administered at multiple intrademnal sites ap- 
pears to be most reliable. An effective immunization pro- 
tocol for rabbits can be found in Vaitukaitis. et al.J. Clin, 
Endocrinol Metab. 33:988-991 (1971). 
[0358] Booster injections can be given at regular in- 

40 tervals, and antiserum harvested when antibody titer 
thereof, as determined semi-quantitatively, for example, 
by double immunodiffusion In agar against known con- 
centrations of the antigen, begins to fall. See, for exam- 
ple, Ouchterlony, et al. Chap. 19 in: Handbook of Ex- 

45 perimental Immunology D. WIer (ed) Blackwell (1973). 
Plateau concentration of antibody is usually in the range 
of 0.1 to 0.2 mg/ml of serum (about 12 jiM). Affinity of 
the anlisera for the antigen is determined by preparing 
competitive binding curves, as described, for example, 

so by Fisher, D,, Chap. 42 in: Manual of Clinical Immunol- 
ogy, 2d Ed. (Rose and Friedman, Eds.) Amer. Soc. For 
MicrobbL, Washington, D.C. (li980). 
[0359] Antibody preparations prepared according to 
either of th above protocols ar us ful in a variety of 

ss contexts. In particular, the antibodies nnay be used in 
immunoaffinity chromatography techniques such as 
those described b bw to facilitate large scale isolation, 
purification, or enrichm nt th proteins or polypep- 
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tides encoded by EST-r lated nucleic acids, positional 
segments ot EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or for 
the isolation, purification or enrichment of EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0360] In the case of secreted proteins, the antibodies 
may be used for the isolation, purification, or enrichment 
of the full protein (i.e. the mature protein and the signal 
peptide), the mature protein (i.e. the protein generated 
by cleavage of the signal peptide), or the signal peptide 
are operably finked to promoters and introduced into 
cells as described above. 

[0361] Additionally the antibodies may be used in im- 
munoatfinity chromatography techniques such as those 
described below to isolate, purify, or enrich polypeptides 
which have been linked to the proteins or polypeptides 
encoded by EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids or to iso- 
late, purify, or enrich EST-related polypeptides, frag- 
ments ot EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related polypeptides. 
[0362] The antibodies may also be used to determine 
the cellular locali2atk>n of polypeptides encoded by the 
proteins or polypeptides encoded by EST-related nucle- 
ic acids, positional segments of EST-related nucleic ac- 
ids or friagments of positional segments of EST-related 
nucleic acids or the cellular localization of EST-related 
polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides or 
fragments of positional segments of EST-related 
polypeptides. 

[0363] In addition, the antibodies may also be used to 
determine the cellular localization of polypeptides which 
have been linked to the proteins or polypeptides encod- 
ed by EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic ackjs or polypeptides 
which have been linked EST-related polypeptides, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polypeptides or fragments of positional 
segments of EST-related. polypeptides . 
[0364] The antibodies may also be used in quantita- 
tive immunoassays which determine concentrations of 
antigen-bearing substances in biological samples; they 
may also used semi-quantitatively or qualitatively to 
identify the presence of antigen in a biological sample 
or to identify the type of tissue present in a biological 
sample. The antibodies nr^y also be used in therapeutic 
compositions for killing c lis expr ssing the protein or 
reducing the levels of the protein in the body. 



V. Use of 5'ESTs end Consensus C ntigated 5* ESTs 
or Sequences Obtainable Th refrom or Portions 
Th reof as Reag nts 

5 [036S] The EST-related nucleic acids, posittonal seg- 
ments of EST-related nucleic acids or fragnrtents of po- 
sitional segments of EST-related nucleic acids may be 
used as reagents in isolation procedures, diagnostic as- 
says, and forensic procedures. For example, sequenc- 

10 es from the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids, may be 
detectably labeled and used as probes to isolate other 
sequences capable of hybridizing to them. In addition, 

IS the he EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positiorial 
segments of EST-related nucleic acids may be used to 
design PGR primers to be used in isolation, diagnostic, 
or forensic procedures. 

20 

1. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids in 
isolation, diagnostic and forensic procedures 

EXAMPLE 34 

Preparation of PGR Primers and Amplification of DNA 

30 [0366] The EST-related nucleic acids, positional s g- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to prepare PGR primers for a variety of applica- 
tions, including isolation procedures for cloning nucleic 

3S acids capable of hybridizing to such sequences, diag- 
nostic techniques and forensic techniques. In some em- 
txxiiments, the PGR primers at least 10, 15, 18, 20, 23, 
25, 28, 30, 40, or 50 nucleotides in length. In some em- 
bodiments, the PGR primers may be more than 30 bas- 

40 es in length. It is preferred that the primer pairs have 
approximately the same G/G ratio, so that melting tem- 
peratures are approximately the same. A variety of PGR 
techniques are familiar to those skilled in the art.. For a 
review of PGR technology, see Molecular Gloning to Ge- 

45 netic Engineering White, B. A. Ed. in Methods in Molec- 
ular Biology B7: Humana Press, Totowa 1997. In each 
of these PGR procedures, PGR primers on either side 
of the nucleic acid sequences to be amplified are added 
to a suitably prepared nucleic acid sample along with 

so dNTPs and a thermostable polymerase such as Taq 
polymerase. Ru polymerase, or Vent polymerase. The 
nucleic acid in the sample is denatured and the PGR 
primers are specifically hybridized to complementary 
nucleic acid sequences in th sample. Th hybridized 

ss primers are xtended. Th r after, another cycl of d - 
naturation, hybridization, and extension is initiated. Th 
cycles are r peated multiple times to produce an ampli- 
fied fragment containing the nucleic acid s quenc be- 
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tween the primer sites. 
EXAMPLE 35 

Use of the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids as 
' probes 

[0367] Probes derived from EST-related nucleic ac- 
■ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be labeled with detectable labels familiar to 
those skilled in the art, including radioisotopes and non- 
radioactive labels, to provide a detectable probe. The 
tiet eatable probe may be single stranded or double 
'Stranded and may be rhade using techniques known in 
the art, including in vitro transcription, nick translation, 
or kinase reactions. A nuclerc acid sample containing a 
sequence capable of hybridizing to the labeled probe is 
contacted with the labeled probe. If the nucleic acid in 
the sample is double stranded, it may be denatured prior 
to contacting the probe. In some applications; the nu- 
cleic acid sample may be immobilized on a surface such 
as a nitrocellulose or nylon membrane. The nucleic acid 
sample may comprise nucleic acids obtained from a va- 
riety of sources, including genomic DNA, cDNA librar- 
ies, RNA, or tissue samples. 

[0368] Procedures used to detect the presence of nu- 
cleic acids capable of hybridizing to the detectable 
probe include well known techniques such as Southern 
blotting, Northern blotting, dot blotting, colony hybridi- 
zation, and plaque hybridization. In some applications, 
the nucleic acid capable of hybridizing to the labeled 
probe may be cloned into vectors such as expression 
vectors, sequencing vectors, or in vitro transcription 
vectors to facilitate the characterization and expression 
of the hybridizing nucleic acids in the sample. For ex- 
ample, such techniques may be used to isolate and 
^ clone sequences in a genonnic library or cDNA library 
'■ which are capable of hybridizing to the detectable probe 
as described in Example 18 above. 
[0369] PGR primers made as described in Example 
34 above may be used in forensic analyses, such as the 
DNA fingerprinting techniques described In Examples 
36-40 below. Such analyses may utilize detectable 
"probes or primers based on the sequences of the EST- 
• • related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids. 

EXAMPLE 36 

Forensic Matching by DNA Sequ ncing 

[0370] In on exemplary method, DNA sampi s are 
isolated from forensic specimens of, for xample, hair, 
semen, blood or skin cells by conventional methods. A 



panel of PGR primers bas d on ^ number of th EST- 
relat d nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of posrtional segments of 
EST-related nucleic acids is then utilized in accordance 

5 with Example 34 to amplify DNA of approximately 
100-200 bases in length from the forensic specimen. 
Corresponding sequences are obtained from a test sub- 
ject. Each of these identification DNAs is then se- 
quenced using standard techniques, and a simple da- 

10 tabase comparison determines the differences, if any, 
between the sequences from the subject and those from 
the sample. Statistically significant differences between 
the suspect's DNA sequences and those from the sam- 
ple conclusively prove a lack of identity This lack of 

15 identity can be proven, for example, with only one se- 
quence. Identity, on the other hand, should be demon- 
strated with a large number of sequences, all matching. 
Preferably, a minimum of 50 statistically identical se- 
quences of 1 00 bases in length are used to prove iden- 

20 tity between the suspect and the sample. 

EXAMPLE 37 

Positive Identification by DNA Sequencing 

25 

[0371] The technique outlined in the previous exam- 
ple may also be used on a larger scale to provide a 
unique fingerprint-type identification of any individual. In 
this technique, primers are prepared from a large 

30 number of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids. Prefer- 
ably. 20 to 50 different primers are used. These primers 
are used to obtain a corresponding number of PCR-gen- 

55 erated DNA segments from the individual in question in 
accordance with Example 34. Each of these DNA seg- 
ments is sequenced, using the methods set forth in Ex- 
annple 36. The database of sequences generated 
through this procedure uniquely identifies the individual 

40 from whom the sequences were obtained. The same 
panel of primers may then be used at any later time to 
absolutely correlate tissue or other biological specimen 
with that individual. 

45 EXAMPLE 38 

Southern Blot Forensic Identification 



so 



55 



[0372] The procedure of Example 37 is repeated to 
obtain a panel of at least 10 amplified sequences from 
an individual and a specimen. Preferably, the panel con- 
tains at least 50 amplified sequences. More preferably, 
the panel contains 100 amplified sequences. In some 
embodim nts, th pan I contains 200 amplifi d se- 
qu nces. This PGR-generated DNA is then digested 
with one or a combination of, pr f erably, four bas spe- 
cific restriction enzymes. Such enzymes are corhmer- 
cially available and known to thos of skill in the art. After 
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digestion, the r surtant g n fragments are siz sepa- 
rated in multiple duplicate wells on an agarose gel and 
transferred to nitrocellulose using Southern blotting 
techniques well known to those with skill in the art. For 
a review of Southern blotting see Davis et ai (Basic 
Methods in Molecular Biology. 1986, Elsevier Press, pp 
62-65). 

[0373] A panel of probes based on the sequences of 
the EST-related nucleic acids, positional segnnents of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids are radioactively 
or colorimetrically labeled using methods known in the 
art, such as nick translation or end labeling, and hybrid- 
ized to the Southern blot using techniques known in the 
art (Davis etal, supra). Preferably, the probe is at least 
10. 12, 15, 18, 20. 25, 28. 30, 35. 40, 50. 75. 100. 150, 
200. 300, 400 or 500 nucleotides in length. Preferably, 
the probes are at least 10. 12, 15, 18, 20. 25, 28. 30, 
35, 40, 50. 75. 100, 150, 200, 300. 400 or 500 nucle- 
otides in length. In some embodiments, the probes are 
oligonucleotides which are 40 nucleotides in length or 
less. 

[0374] Preferably, at least 5 to 10 of these labeled 
probes are used, and more preferably at least about 20 
or 30 are used to provide a unique pattern. The resultant 
bands appearing from the hybridization of a targe sam- 
ple of EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids will be a unique 
identifier Since the restriction enzyme cleavage will be 
different for every individual, the band pattern on the 
Southern blot will also be unique. Increasing the number 
of probes will provide a statistk:ally higher level of con- 
fidence in the identification since there will be an in- 
creased number of sets of bands used tor identification. 

EXAMPLE 39 

Dot Blot Identification Procedure 

[0375] Another technique for identifying individuals 
using the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids dis- 
closed herein utilizes a dot blot hybridization technique. 
[0376] Genomic DNA is isolated from nuclei of subject 
to be identified. Probes are prepared that correspond to 
at least 10, preferably 50 sequences from the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids. The probes are used to hy- 
bridize to the genomic DNA through conditions known 
to those in the art. The oligonucleotides are end labeled 
with P^ using polynuci otid kinase (Pharmacia). Dot 
Blots ar created by spotting the genomic DNA onto ni- 
trocellulose or the lik using a vacuum dot bk>t manifold 
(BioRad. Richmond California). The nitrocellulose filter 
containing the genomic s quences is baked or UV 



linked to the fift r. pr hybridized and hybridized with la- 
b led probe using techniques known in the art (Davis et 
al, supra). The ^P labeled DNA fragments are sequen- 
tially, hybridized with successively stringent conditions 

5 to detect minimal differences between the 30 bp se- 
quence and the DNA. Tetramethylammonium chloride 
is useful for identifying clones containing small numbers 
of nucleotide mismatches (Wood e/a/., Proc, NatL Acad. 
ScL USA 82(6): 1585-1 588 (1985)). A unque pattem of 

10 dots distinguishes one individual from another individu- 
al. 

[0377] EST-related nuclec acids, positional seg- 
ments of EST-related nuciek; acids or fragments of po- 
sitional segments of EST-related nucleic acids can be 

IS used as probes in the following alternative fingerprinting 
technique. In some embodiments, the probes are oligo- 
nucleotides which are 40 nucleotides in length or less. 
[0378] Preferably, a plurality of probes having se- 
quences from different EST-related nucleic acids, posi- 

20 lional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids are used in the alternative fingerprinting technique. 
Example 40 below provides a representative alternative 
fingerprinting procedure in which the probes are derived 

25 from EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 



30 



EXAMPLE 40 



Alternative "Fingerprint' Identification Technique 



[0379] Oligonucleotides are prepared from a large 
number, e.g. 50, 100, or 200, EST-related nucleic acids, 

55 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids using commercially available oligonucleotide 
services such as Genset, Paris, France. Preferably, the 
oligonucleotides are at least 10. 15, 18, 20, 23, 25 28, 

40 or 30 nucleotides in length. However, in some embodi- 
ments, the oligonucleotides may be more than 30 nu- 
cleotides in length. 

[0380] Cell samples from the test subject are proc- 
essed for DNA using techniques well known to those 

45 with skill in the art. The nucleic acid is digested with re- 
striction enzymes such as EcoRI and Xbal. Following 
digestion, samples are applied to wells for electrophore- 
sis. The procedure, as known in the art, may be modified 
to accommodate polyacrylamide electrophoresis, how- 

50 ever in this example, samples containing 5 ug of DNA 
are loaded into wells and separated on 0.8% agarose 
gels. The gels are transferred onto nitrocellulose using 
standard Southern blotting techniques. 
[0381] 10 ng of each of th oligonucleotides are 

55 pool d and end-labeled with P^^. The nitrocellulose is 
prehybridiz d with blocking solution and hybridized with 
the labeled probes. Following hybridization and wash- 
ing, the nitrocellulose filter is exposed to X-Omat AR X- 
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ray film. The resulting hybridization patt rn will be 
uniqu for each individual. 

[0382] It is additionally contemplated within this ex- 
ample that the number of probe sequences used can be 
varied for additional accuracy or clarity. 
[0383] In addition to their applications in forensics and 
identification, EST-related nucleic acids, positional seg- 
' rhents of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
nhapped to their chromosomal locations. Example 41 
below describes radiatbn hybrid (RH) mapping of hu- 
man chromosomal regions using EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids. Example 42 below describes a representa- 
' five procedure for mapping EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
"- acids to their locations on human chromosomes. Exam- 
ple 43 below describes mapping of EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids on metaphase chromosomes by Fluores- 
cence In Situ Hybridization (FISH). 

2. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids in 
Chromosome Mapping 

EXAMPLE 41 

Radiation hybrid mapping of EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids to the human genome 

[0384] Radiation hybrid (RH) mapping is a somatic 
cell genetic approach that can be used for high resolu- 
tion mapping of the human genome. In this approach, 

' cell lines containing one or more human chromosomes 
are lethally irradiated, breaking each chromosome into 

' fragments whose size depends on the radiation dose. 
These fragments are rescued by fusion with cultured ro- 
dent cells, yielding subclones containing different por- 
tions of the human genome. This technique is described 
- by Benham etai (Genom/cs 4:509-51 7, 1989) and Cox 
el aL, (Science 250:245-250, 1990). The random and 
independent nature of the subclones permits efficient 
mapping of any human genome marker. Human DNA 
isolated from a panel of 80-100 cell lines provides a 
mapping reagent for ordering EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of E ST-r lat d nucleic 
acids. In this approach, the fr quency of breakage be- 
tween markers is used to measure distanc , allowing 
construction of fine resolution maps as has been done 
using conventional ESTs (Schuler et ai. Science 274: 



540-546, 1996). 

[0385] RH mapping has been used to generate a 
high-resolution whole genome radiation hybrid map of 
hunnan chromosome 17q22-q25.3 across the genes for 

5 growth hormone (GH) and thymidine kinase (TK) (Fos- 
ter etal. Genomics 33: 185-1 92, 1996), the region sur- 
rounding the Gorlin syndrome gene (Obermayr et aL, 
Eur. J. Hum, Genet. 4:242-245, 1996), 60 loci covering 
the entire short arm of chromosome 12 (Raeymaekers 

10 etai. Genomics 29:M0-M&, 1995), the region of hu- 
man chromosome 22 containing the neurofibromatosis 
type2 locus (F razeref a/., Genom/cs 14:574-584, 1992) 
and 13 loci on the long arm of chromosome 5 (War- 
rington et al.. Genomics 11 :701 -708,. 1 991 ). 

IS 

EXAMPLE 42 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
20 positional segments of EST-related nucleic acids to 
Human Chromosomes using PGR techniques 

[0386] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 

25 sitional segments of EST-related nucleic acids may be 
assigned to human chromosomes using PGR based 
nriethodologies. In such approaches, oligonucleotide 
primer pairs are designed from EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 

30 fragments of positional segments of EST-related nucleic 
acids to minimize the chance of amplifying through an 
intron. Preferably, the oligonucleotide primers are 18-23 
bp in length and are designed for PGR amplification. The 
creation of PGR primers from known sequences is well 

35 known to those with skill in the art. For a review of PGR 
technobgy see Eriich. in PGR Technology; Principles 
and Applications for DNA Amplification. 1992. W.H. 
Freeman and Go., New York. 

[0387] The primers are used in polymerase chain re- 

40 actions (PGR) to amplify templates from total human ge- 
nomic DNA. PGR conditions are as follows: 60 ng of ge- 
nomic DNA is used as a template for PGR with 80 ng of 
each oligonucleotide primer, 0.6 unit of Taq polymerase, 
and 1 ^iCu of a 32P-labeled deoxycytidine triphosphate, 

45 The PGR is performed in a microplate thermocycler 
(Techne) under the following conditions: 30 cycles of 
94**C, 1 A min; 55**G, 2 min; and 72''C, 2 min; with a final 
extension at 72*'C for 10 min. The amplified products 
are analyzed on a 6% polyacrylamide sequencing gel 

so and visualized by autoradiography. If the length of the 
resulting PGR product is identical to the distance be- 
tween the ends of the primer sequences in the 5'EST 
from which the primers are derived, then the PGR reac- 
tion is repeated with DNA templat s from two panels of 

55 human-rodent somatic cell hybrids, BIOS PCRable 
DNA (BIOS Gorporation) and NIGMS Human-Rod nt 
Somatic Cell Hybrid Mapping Pan I Numb r1 (NIGMS, 
Gamden, NJ). 
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[0388] PGR is used to screen a sen s of somatic cell 
hybrid ceil lines contaNng d fined sets of human chro- 
mosomes for the presence of a given 5*EST DNA is iso- 
lated from the somatic hybrids and used as starting tem- 
plates for PGR reactions using the primer pairs from the 
EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids. Only those somatic 
cell hybrids with chromosomes containing the human 
gene corresponding to the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids will yield an amplified fragment. The 5'ESTs are 
assigned to a chronrK>some by analysis of the segrega- 
tion pattern of PGR products from the somatic hybrid 
DNA templates. The single human chromosome 
present in all cell hybrids that give rise to an amplified 
fragment is the chromosome containing that EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids. For a review of techniques and 
analysis of results from somatic cell gene mapping ex- 
periments, (See Ledbetter et al., Genomics 6:475-481 
(1990)). 

[0389] Alternatively, the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be mapped to individual chromosomes using 
FISH as described in Example 43 below 

EXAMPLE 43 

Mapping of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleb acids to 
Chromosomes Using 

Fluorescence In Situ Hybridization 

[0390] Fluorescence in situ hybridization allows the 
EST-related nucleic.acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids to be mapped to a 
particular location on a given chromosome. The chro- 
mosomes to be used for fluorescence in situ hybridiza- 
tion techniques may be obtained from a variety of sourc- 
es including cell cultures, tissues, or whole blood. 
[0391] In a preferred embodiment, chromosomal lo- 
calization of EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids are ob- 
tained by FISH as described by Gherif etaL {Proc. NatL 
Acad. Sci. U.S.A. 87:6639-6643, 1990). Metaphase 
chromosomes are pr pared from phytoh magglutinin 
(PHA)-stimulated blood cell donors. PHA-stimulated 
lymphocytes from healthy males are cultured for 72 h in 
RPMM640 medium. For synchronization, methotrexate 
(10 jaM) is added for 1 7 h, followed by addition of 5-bro- 



nfKxdeoxy uridine (5-BrdU, 0.1 mM)for 6 h. Golcemid (1 
pg/ml) is added for the last 1 5 min^'befor han/esting the 
cells. Gells are collected, washed in RPMI. incubated 
with a hypotonic solution of KGI (75 mM) at 37**G for 15 

5 min and fixed in three changes of methanol:acetic acid 
(3:1 ). The cell suspension is dropped onto a glass slide 
and air dried. The EST-related nucleic acids, positional 
segments of EST-related nucleb acids or fragments of 
positional segments of EST-related nucleic acids is la- 

10 beled with biotin-1 6 dUTP by nick translation according 
to the manufacturer's instructions (Bethesda Research 
Laboratories, Bethesda, MD), purified using a Sepha- 
dex G-50 column (Pharmacia. Upsala. Sweden) and 
precipitated. Just prior to hybridization, the DNA pellet 

IS is dissolved in hybridization buffer (50% fonmamide. 2 X 
SSG. 10% dextran sulfate, 1 mg/ml sonicated salmon 
sperm DNA, pH 7) and the probe is denatured at 70°C 
for 5-10 min. 

[0392] Slides kept at -20'C are treated tor 1 h at 37*'C 

20 withRNase A(100^lg/ml), rinsed three times in 2 XSSC 
and dehydrated in an ethanol series. Chromosome 
preparations are denatured in 70% formamide, 2 XSSC 
for 2 min at 70'*G, then dehydrated at 4*C, The slides 
are treated with proteinase K (10 ^g/100 ml in 20 mM 

2S Tris-HGI, 2 mM GaGIg) at 37'*G for 8 min and dehydrat- 
ed. The hybridization mixture containing the prob is 
placed on the slide, covered with a coverslip, sealed with 
rubber cement and incubated overnight in a humid 
chamber at 37**G. After hybridization and post-hybridi- 

30 zation washes, the biotinylated probe is detected by avi- 
din-FITC and amplified with additional layers of bioti- 
nylated goat anti-avid in and avidin-FITC. For chromo- 
somal localization, fluorescent R-bands are obtained as 
previously described (Cherif et al, supra.). The slides 

35 are observed under a LEICA fluorescence microscope 
(DMRXA). Chromosomes are counterstained with pro- 
pidium iodide and the fluorescent signal of the probe ap- 
pears as two symmetrical yellow-green spots on both 
chromatids of the fluorescent R-band chromosome 

40 (red). Thus, a partrcular EST-related nucleic acids, po- 
sitional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids may be localized to a particular cytogenetic R-band 
on a given chromosome. Once the EST-related nucleic 

4S acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids have been assigned to particular chromo- 
somes using the techniques described in Examples 
41-43 above, they may be utilized to construct a high 

so resolution map of the chromosomes on which they are 
kx:ated or to identify the chromosomes in a sample. 

EXAMPLE 44 

55 Us of EST-relat d nucleic acids, positional segments 
of EST-r lated nucleic acids or fragments of positional 
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segments of EST-related nucleic acids to Construct or 
Expand Chromosonne Maps 

[0393] Chromosom nnapping involves assigning a 
given unique sequence to a particular chronnosome as 
described above. Once the unique sequence has been 
mapped to a given chromosome, it is ordered relative to 

■ ^bther unique sequences located on the same chromo- 
some. One approach to chromosome mapping utilizes 
a series of yeast artificial chromosomes (YACs) bearing 

-several thousand long inserts derived from the chromo- 
somes of the organism from which the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids are obtained- This approach is de- 
scribed in Ramaiah Nagaraja ef a/.. Genome Research 
7:210-222, March 1997. Briefly, in this approach each 
chromosome Is broken into overlapping pieces which 
are inserted into the YAC vector The YAC inserts are 
screened using PGR or other methods to determine 
whether they include the EST-related nucleic acids, po- 
sitional segments of EST-related nucleic acids or frag- 
ments of positbnal segments of EST-related nucleic ac- 
ids whose position is to be determined. Once an insert 
has been found which includes the 5'EST, the insert can 
be analyzed by PGR or other methods to detemnine 
whether the insert also contains other sequences known 
to be on the chromosome or in the region from which 
the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids was derived. 
Th is process can be repeated for each inse rt in the YAG 
library to determine the location of each of the EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids relative to one another and to 
other known chrorrtosomal markers. In this way, a high 
resolution map of the distribution of numerous unique 
markers along each of the organisms chromosomes 
may be obtained. 

[0394] As described in Example 45 below EST-relat- 
ed nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids may also be used to identify genes 
associated with a particular phenotype, such as hered- 
itary disease or drug response. 

3. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids Gene 
Identification 

EXAMPLE 45 

Identification of genes associated with hereditary 
diseases or drug respons 

[0395] This example illustrates an approach useful for 
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th association of EST-related nuci ic acids, positional 
s gments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids with 
particular phenotypic characteristics. In this example, a 

5 particular EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids is used 
as a test probe to associate that EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 

10 fragments of positional segments of EST-related nucleic 
acids with a particular phenotypic characteristic. 
[0396] EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids are 

IS mapped to a particular location on a human chromo- 
some using techniques such as those described in Ex- 
amples 41 and 42 or other techniques known in the art. 
A search of Mendelian Inheritance in Man (V. McKusick, 
Mendelian inheritance in Man (available on line through 

20 Johns Hopkins University Welch Medical Library) r - 
veals the regbn of the human chromosome which con- 
tains the EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to be a very gene 

25 rich region containing several known genes and several 
diseases or phenotypes for which genes have not been 
identified. The gene corresponding to this EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 

30 related nucleic acids thus becomes an immediate can- 
didate for each of these genetic diseases. 
[0397] Cells from patients with these diseases or phe- 
notypes are isolated and expanded in culture. PGR 
primers from the EST-related nucleic acids, positional 

35 segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids are 
used to screen genomic DNA, mRNA or cDNA obtained 
from the patients. EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 

40 positional segments of EST-related nucleic acids that 
are not amplified in the patients can be positively asso- 
ciated with a particular disease by further analysis. Al- 
ternatively, the PGR analysis may yiekJ fragments of dif- 
ferent lengths when the samples are derived from an 

45 individual having the phenotype associated with the dis- 
ease than when the sample is derived from a healthy 
individual, indicating that the gene containing the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 

so EST-related nucleic acids may be responsible for the 
genetic disease. 

VI. Use of EST-related nucleic acids, positional 
segments of EST-r latednu leic acids or fragnvsnts 
55 of positi nal segments of EST-related nucleic acids 
to Construct Vectors 

[0398] The pres nt EST-related nucleic acids, posi- 
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tional segments of EST-related nuci ic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids may also be used to ccn struct secret bn vectors ca- 
pable of directing the secretion of the proteins encoded 
by genes therein. Such secretion vectors may facilitate 
the purification or enrichment of the proteins encoded 
by genes inserted therein by reducing the number of 
background proteins from which the desired protein 
must be purified or enriched. Exemplary secretion vec- 
tors are described in Example 46 below. 

1 . Construction of secretion vectors 

EXAMPLE 46 

Construction of Secretion Vectors 

[0399] The secretion vectors of the present invention 
include a promoter capable of directing gene expression 
in the host cell, tissue, or organism of interest. Such pro- 
moters include the Rous Sarcoma Virus promoter, the 
SV40 promoter, the human cytomegalovirus promoter, 
and other promoters familiar to those skilled in the art. 
[0400] A signal sequence from one of the EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids is operably linked to the pronrxjter 
such that the mRNA transcribed from the promoter will 
direct the translation of the signal peptide. Preferably, 
the signal sequence is from one of the nucleic acids of 
SEQ ID NOs.:24-4100. The host cell; tissue, or organ- 
ism may be any cell, tissue, or organism which recog- 
nizes the signal peptide encoded by the signal se- 
quence in the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids. Suitable 
hosts include n^ammalian cells, tissues or organisms, 
avian cells, tissues, or organisms, insect cells, tissues 
or organisms, or yeast. 

[0401] In addition, the secretion vector contains clorv 
ing sites for inserting genes encoding the proteins which 
are to be secreted. The cloning sites facilitate the clon- 
ing of the insert gene in frame with the signal sequence 
such that a fusion protein in which the signal peptide is 
fused to the protein encoded by the inserted gene is ex- 
pressed from the mRNA transcribed from the promoter. 
The signal peptide directs the extracellular secretion of 
the fusion protein. 

[0402] The secretion vector may be DNA or RNA and 
may integrate into the chromosome of the host, be sta- 
bly maintained as an extrachromosomal replicon in the 
host, be an artificial chronrK>some, or be transiently 
present in the host. Preferably, the secretion vector is 
maintain d in multipl copies in ach host c II. As us d 
her in, multipl copi s means at I ast 2, 5, 10, 20, 25, 
50 or more than 50 copies p r cell. In some embodi- 
ments, the multiple copi s are maintained extrachromo- 
somally. In other embodiments, the multiple copies re- 
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suit from amplification of a chromosomal sequence. 
[0403] Many nucleic acid backbon s suitable for use 
as secretion vectors are known to those skilled in the 
art, including retroviral vectors, SV40 vectors. Bovine 

5 Papilloma Virus vectors, yeast integrating plasmicte, 
yeast episomal plasmids, yeast artificial chromosomes, 
hunnan artificial chromosomes. P element vectors, bac- 
uk>virus vectors, or bacterial plasmids capable of being 
transiently introduced into the host. 

10 [0404] The secretion vector may also contain a polyA 
signal such that the potyA signal is located downstream 
of the gene inserted into the secretion vector. 
[0405] After the gene encoding the protein for which 
secretion is desired is inserted into the secretion vector, 

IS the secretion vector is introduced into the host cell, tis- 
sue, or organism using calcium phosphate precipitation. 
DEAE-Dextran, electroporation, liposome-mediated 
transfection. viral particles or as naked DNA. The pro- 
tein encoded by the inserted gene is then purified or en- 

20 riched from the supernatant using conventional tech- 
niques such as ammonium sulfate precipitation, immu- 
noprecipitation, immunoaff in ity chromatography, size 
exclusion chromatography, ion exchange chromatogra- 
phy, and HPLC. Alternatively, the secreted protein may 

25 be in a sufficiently enriched or pure state in the sup r- 
natant or growth media of the host to permit it to be us d 
for its intended purpose without further enrichment. 
[0406] The signal sequences may also be inserted in- 
to vectors designed for gene therapy. In such vectors, 

30 the signal sequence is operably linked to a promoter 
such that mRNA transcribed from the promoter encodes 
the signal peptide. A cloning site is located downstream 
of the signal sequence such that a gene encoding a pro- 
tein whose secretion is desired may readily be inserted 

35 into the vector and fused to the signal sequence. The 
vector is introduced into an appropriate host cell. The 
protein expressed from the promoter is secreted extra- 
cellularly, thereby producing a therapeutic effect. 

40 EXAMPLE 47 

Fusbn Vectors 

[0407] The EST-related nucleic acids, positional seg- 
45 ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may be 
used to construct fusion vectors for the expression of 
chimeric polypeptides. The chimeric polypeptides com- 
prise a first polypeptide portion and a second polypep- 
50 tide portion. In the fusion vectors of the present inven- 
tion, nucleic acids encoding the First polypeptide portion 
and the second polypeptide portion are joined in frame 
with one another so as to generate a nucleic acid en- 
coding the chim ric polyp ptide. Th nucI ic acid en- 
55 coding th chim ric polypeptide is operably link d to a 
promot r which directs the xpression of an mRNA en- 
coding the chim ric polypeptid . Th promot r may be 
in any of the expression vectors described herein includ- 
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ing those described in Examples 20 and 46. 

[0408] Preferably, the fusion vector is maintained in 

multiple copies in each host cell. In some embodiments. 

the multiple copies are maintained extrachromosomally. 

In other embodiments, the multiple copies result from 

amplification of a chromosomal sequence. 

[0409] The first polypeptide portion may comprise any 

'^6f the polypeptides encoded by the EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 

-'cleic acids. In some embodiments, the first polypeptide 
portion may be one of the EST-related polypeptides, 
fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides. 

^ [0410] The second polypeptide portion may comprise 
any polypeptide of interest. In some embodiments, the 
second polypeptide portion may comprise a polypeptide 

' having a detectable enzymatic activity such as green flu- 
orescent protein or p galactosidase. Chimeric polypep- 
tides in which the second polypeptide portion comprises 
a detectable polypeptide may be used to determine the 
intracellular localization of the first polypeptide portion. 
In such procedures, the fusion vector encoding the chi- 
meric polypeptide is introduced into a host cell under 
conditions which facilitate the expression of the chimeric 
polypeptide. Where appropriate, the cells are treated 
with a detection reagent which is visible under the mi- 
croscope following a catalytic reaction with the detecta- 
ble polypeptide and the cellular location of the detection 
reagent is determined. For example, if the polypeptide 
having a detectable enzymatic activity is P galactosi- 
dase. the cells may be treated with XgaL Alternatively, 
where the detectable polypeptide is directly detectable 
without the addition of a detection reagent, the intracel- 
lular location of the chimeric polypeptide is determined 
by performing microscopy under conditions in which the 
dectable polypeptide is visible. For example, if the de- 
tectable polypeptide is green fluorescent protein or a 
modified version thereof, microscopy is performed by 
exposing the host cells to light having an appropriate 
wavelength to cause the green fluorescent protein or 
modified version thereof to fluoresce. 
[0411] Alternatively, the second polypeptide portion 
may comprise a polypeptide whose isolation, purifica- 
tion, or enrichment is desired. In such embodiments, the 
isolation, purification, or enrichment of the second 
polypeptide portion may be achieved by performing the 
immunoaffinity chromatography procedures described 
below using an immunoaffinity column having an anti- 
body directed against the first polypeptide portion cou- 
pled thereto. 

[0412] The proteins encoded by the EST-related nu- 
cleic acids, positional segments of EST-relat d nucleic 
acids or fragments of positional segm nts of EST-relat- 
d nucleic acids or th EST-related polypeptid s, frag- 
ments of EST-related polypeptides, positional segments 
of EST-related polyp ptides, or fragments of positional 



segments of EST-related polypeptides may also be 
used to generate antibodies as explained in Examples 
20 and 33 in order to identify the tissue type or cell spe- 
cies from which a sample is derived as described in Ex- 
5 ample 48. 

EXAMPLE 48 

Identification of Tissue Types or Cell Species bv Means 
10 of Labeled Tissue Specific Antibodies 

[0413] Identification of specific tissues is accom- 
plished by the visualization of tissue specific antigens 
by means of antibody preparations according to Exam- 

is pies 20 and 33 which are conjugated, directly or indi- 
rectly to a detectable marker. Selected labeled antibody 
species bind to their specific antigen binding partner in 
tissue sections, cell suspensions, or in extracts of solu- 
ble proteins from a tissue sample to provide a pattern 

20 for qualitative or semi-qualitative interpretation. 

[0414] Antisera for these procedures must have a po- 
tency exceeding that of the native preparation, and for 
that reason, antibodies are concentrated to a mg/ml lev- 
el by isolation of the gamma globulin fraction, for exam- 

25 pie, by ion-exchange chromatography or by ammonium 
sulfate fractionation. Also, to provide the most specific 
antisera, unwanted antibodies, for example to common 
proteins, must be removed from the gamma globulin 
fraction, for example by means of insoluble immunoab- 

30 sorbents, before the antibodies are labeled with the 
marker. Either monoclonal or heterologous antisera is 
suitable for either procedure. 

1. Immunohistochemical Techniques 

35 

[0415] Purified, high-titer antibodies, prepared as de- 
scribed above, are conjugated to a detectable marker, 
as described, for example, by Fudenberg, H., Chap. 26 
in: Basic 503 Clinical Immunology; 3^^ Ed. LangO: Los 
40 Altos, California (1980) or Rose,, et al. Chap. 12 in. 
Methods in Immunodiagnosis, 2d Ed. John Wiley and 
Sons, New York (1980). 

[0416] A fluorescent marker, either fluorescein or 
rhodamine, is preferred, but antibodies can also be la- 

45 beled with an enzyme that supports a color producing 
reaction with a substrate, such as horseradish peroxi- 
dase. Markers can be added to tissue-bound antibody 
in a second step, as described below. Alternatively, the 
specific antitissue antibodies can be labeled with ferritin 

so or other electron dense particles, and localization of the 
ferritin coupled antigen-antibody complexes achieved 
by means of an electron microscope. In yet another ap- 
proach, the antibodies are radiolabeled, with, for 
xample i2S|^ and detected by ov riaying the antibody 

55 treated preparation with photographic emu Is ion . 

[0417] Preparations to carry out the procedur s can 
comprise monoclonal or polyclonal antibodies to a sin- 
gle protein or peptide identified as sp cific to a tissue 
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type, for exampi . brain tissue, or antibody preparations 
to several antig nicalty distinct tissue specific antigens 
can be used in panels, independently or in mixtures, as 
required. 

[041 8] Tissue sections and cell suspensions are pre- 
pared for immunohistochemical examination according 
to common histological techniques. Multiple cryostat 
sections (about 4 ^im, unfixed) of the unknown tissue 
and known control, are mounted and each slide covered 
with different dilutions of the antibody preparation. Sec- 
tions of known and unknown tissues shoukJ also be 
treated with preparations to provide a positive control, 
a negative control, for example, pre-immune sera, and 
a control for non-specific staining, for example, buffer. 
[0419] Treated sections are incubated in a humid 
chamber for 30 min at room temperature, rinsed, then 
washed in buffer for 30-45 min. Excess fluid is blotted 
away, and the marker.developed. 
[0420] It the tissue specific antibody was not labeled 
in the first incubation, it can be labeled at this time in a 
second antibody-antibody reaction, for example, by 
adding fluorescein- or enzyme-conjugated antibody 
against the immunoglobulin class of the antiserum-pro- 
ducing species, for example, fluorescein labeled anti- 
body to mouse IgG. Such labeled sera are commercially 
available. 

[0421] The antigen found in the tissues by the above 
procedure can be quantified by measuring the intensity 
of color or fluorescence on the tissue section, and cali- 
brating that signal using appropriate standards. 

2. Identification of Tissue Speciftc Soluble Proteins 

[0422] The visualization of tissue specific proteins 
and identification of unknown tissues from that proce- 
dure is carried out using the labeled antibody reagents 
and detection strategy as described for immunohisto- 
chemistry; however the sample is prepared according 
to an electrophoretic technique to distribute the proteins 
extracted from the tissue in an orderly an-ay on the basis 
of molecular weight for detection. 
[0423] A tissue sample is homogenized using a Virtis 
apparatus; cell suspensions are disrupted by Dounce 
homogenization or osmotic lysis, using detergents in ei- 
ther case as required to disrupt cell membranes, as is 
the practice in the art. Insoluble cell components such 
as nuclei, microsomes, and membrane fragments are 
removed by ultracentrif ugation. and the soluble protein- 
containing fraction concentrated if necessary and re- 
served for analysis. 

[0424] A sample of the soluble protein solution is re- 
solved into individual protein species by conventional 
SDS polyacrylamide electrophoresis as described, for 
xample, by Davis.L. efa/., Sectran 19-2 in: Basic Meth- 
ods in Molecular Biology (P. L d r. ed). Els vier, New 
York (1 986), using a range of amounts of polyacrylamid 
in a set of gels to r solve the entire molecular weight 
range of proteins to be detected in the sample. A size 



marker is run in parallel for purpos s of estimating mo- 
lecular weights of the constituent**proteins. Sample size 
for analysis is a convenient volume of from 5 to 55 ^il, 
and containing from about 1 to 1 00 ^ig protein. An aliquot 

5 of each of the resolved proteins e transferred by blotting 
to a nitrocellulose filter paper, a process that maintains 
the pattem of resolution. Multiple copies are prepared. 
The procedure, known as Western Blot Analysis, is well 
described in Davis. L. et al., supra Section 19-3. One 

10 set of nitrocellulose btots is stained with Coomassie 
Blue dye to visualize the entire set of proteins lor com- 
parison with the antibody bound proteins. The remaining 
nitrocellulose fitters are then incubated with a solution 
of one or more specific antisera to tissue specific pro- 

is teins prepared as described in Examples 20 and 33. In 
this procedure, as in procedure A above, appropriate 
positive and negative sample and reagent controls are 
run. 

[0425] In either procedure described above a detect- 
20 able label can be attached to the primary tissue antigen- 
primary antibody complex accordirig to var»us strate- 
gies and permutations thereof. In a straightforward ap- 
proach, the primary specific antibody can be labeled; al- 
ternatively, the unlabeled complex can be bound by a 
2S labeled secondary anti-IgG antibody. In other approach- 
es, either the primary or secondary antibody is conju- 
gated to a biotin molecule, which can, in a subsequ nt 
step, bind an avidin conjugated marker. According to yet 
another strategy, enzyme labeled or radioactive prot in 
30 A, which hastheproperty of binding to any IgG, is bound 
in a final step to either the primary or secondary anti- 
body. 

EXAMPLE 49 

35 

Immunohistochemical Localization of Polypeptides 

[0426] The antibodies prepared as described in Ex- 
amples 20 and 33 above may be utilized to determine 

40 the cellular location of a polypeptide. The polypeptide 
may be any of the polypeptides encoded by EST-related 
nucleic acids, positional segments of EST-related nu- 
cleic acids or fragments of positional segments of EST- 
related nucleic acids or the polypeptide may be one of 

45 the EST-related polypeptides, fragments of EST-related 
polypeptides, positional segments of EST-retated 
polypeptides, or fragments of positional segments of 
EST-related polypeptides. In some embodiments, th 
polypeptide may be a chimeric polypeptide such as 

so those encoded by the fusion vectors of Example 47. 
[0427] Cells expressing the polypeptide to be local- 
ized are applied to a microscope slide and fixed using 
any of the procedures typically employed in immunohis- 
toch mical localization techniqu s. including th m th- 

55 ods describ d in Current Protocols in Molecular Biotogy, 
John Wil y and Sons, Inc. 1997. Following a washing 
step, the cells are contacted with the antibody. In som 
embodiments, the antibody is conjugated to a detecta- 
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ble marker as described abov tofacilitat d taction. Al- 
ternatively, In some embodim nts, after the cells have 
been contacted with an antibody to the polypeptide to 
be localized, a secondary antibody which has been con- 
jugated to a detectable marker is placed in contact with 
the antibody against the polypeptide to be localized. 
[0428] Thereafter, microscopy is performed under 
conditions suitable for visualizing the cellular location of 
the polypeptide. 

[0429] The visualization of tissue specific antigen 
"^binding at levels above those seen in control tissues to 
one or more tissue specific antibodies, directed against 
the polypeptides encoded by EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
•^acids or antibodies against the EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides^ or fragments of 
positional segments of EST-related polypeptides, can 
identify tissues of unknown origin, for example, forensic 
samples, or differentiated tumor tissue that has metas- 
tasized to foreign bodily sites. 

[0430] The antibodies of Example 20 and 33 may also 
be used in the immunoaffinity chromatography tech- 
niques described below to isolate, purify or enrich the 
polypeptides encoded by the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids or to isolate, purify or enrich EST-related polypep- 
tides, fragments of EST-related polypeptides, positional 
segments of EST-related polypeptides, or fragments of 
positional segments of EST-related polypeptides. The 
immunoaffinity chromatography techniques described 
below may also be used to isolate, purify or enrich 
polypeptides which have been linked to the polypep- 
tides encoded by the EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 
ments of posit bnal segments of EST-related nucleic ac- 
ids or to isolate, purify or enrich polypeptides which have 
been linked to EST-related polypeptides, fragments of 
EST-related polypeptides, positional segments of EST- 
related polypeptides, or fragments of positional seg- 
ments of EST-related polypeptides. 

EXAMPLE 50 

Immunoaffinity Chromatography 

[0431] Antibodies prepared as described above are 
coupled to a support. Preferably, the antibodies are 
monoctonal antibodies, but polyclonal antibodies may 
also be used. The support may be any of those typically 
employed in immunoaffinity chromatography, including 
Sepharose CL-4B (Pharmacia, Piscataway, NJ). 
SepharoseCL-2B (Pharmacia, Piscataway, NJ), Affi-gel 
10 (Biorad, Richmond, CA), or glass beads. 
[0432] The antibodies may be coupled to the support 
using any of the coupling reagents typically used in im- 



munoaffinity chromatography, including cyanogen bro- 
mid . Af te r coupling the antibody to Ih support, th sup- 
port is contacted with a sample which contains a target 
polypeptide whose isolation, purification or enrichment 

5 is desired. The target polypeptide may be a polypeptide 
encoded by the EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragments of 
positional segments of EST-related nucleic acids or the 
target polypeptide may be one of the EST-related 

10 polypeptides, fragments of EST-related polypeptides, 
positional segments of EST-related polypeptides, or 
fragments of positional segments of EST-related 
polypeptides. The target polypeptides may also be 
polypeptides which have been linked to the potypep- 

IS tides encoded by the EST-related nucleic acids, posi- 
tional segments of EST-related nucleic acids or frag- 
ments of positional segments of EST-related nucleic ac- 
ids or the target polypeptides may be polypeptides 
which have been linked to EST-related polypeptides, 

20 fragments of EST-related polypeptides, positional seg- 
ments of EST-related polypeptides, or fragments of po- 
sitional segments of EST-related polypeptides using the 
fusion vectors described above. 

[0433] Preferably, the sample is placed in contact with 
2S the support for a sufficient amount of time and under 
appropriate conditions to allow at least 60% of the target 
polypeptide to specifically bind to the antibody coupled 
to the support. 

[0434] Thereafter, the support is washed with an ap- 

30 propriate wash solution to remove polypeptides which 
have non-specifically adhered to the support. The wash 
solution may be any of those typically employed in im- 
munoaffinity chromatography, including PBS, Tris-Iithi- 
um chloride buffer (0.1 M lysine base and 0.5M lithium 

35 chloride, pH 8.0), Tris-hydrochlorlde buffer (O.OSf^ Tris- 
hydrochloride, pH 8.0), or Tris/Triton/NaCI buffer (50mM 
Tris.cl,pH8.0or9.0, 0.1%TritonX-100, andO.SMNaCl). 
[0435] After washing, the specifically bound target 
polypeptide is eluted from the support using the high pH 

40 or low pH elution solutions typically employed in immu- 
noaffinity chromatography. In particular, the elution so- 
lutions may contain an eiuant such as triethanolarriine, 
diethylamine, calcium chloride, sodium thiocyanate, po- 
tasssium bromide, acetic acid, or glycine. In some em- 

45 bodiments, the elution solution may also contain a de- 
tergent such as Triton X-100 or octyl-p-D-glucoside. 
[0436] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 

so be used to clone sequences located upstream of the 
5'ESTs which are capable of regulating gene expres- 
sion, including promoter sequences, enhancer se- 
quences, and other upstream sequences which influ- 
enc transcription or translation lev Is. Once identified 

55 and clon d, th s upstr am r gulatory s quenc s may 
be used in expr sslon v ctors designed to dir ct the x- 
presslon of an insert d gene In a desir d spatial, tem- 
poral, developmental, or quantitative fashion. Example 
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51 d scribes a method for cloning sequenc s upstream 
ot the EST-r lated nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. 

2. Identification of upstream sequences with promoting 
or requlatorv activities 

EXAMPLE 51 

Use of EST-related nucleic acids, positional segments 
of EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids to Clone 
Upstream Sequences from Genomic DNA 

[0437] Sequences derived from EST-related nucleic 
acids, positional segments of EST-related nucleic acids 
or fragments of positional segments of EST-related nu- 
cleic acids may be used to isolate the promoters of the 
corresponding genes using chrorTK>some walking tech- 
niques. In one chromosome walking technique, which 
utilizes the GenomeVVtelker™ kit available from Clon- 
tech, five complete genomic DNA samples are each di- 
gested with a different restriction enzyme which has a 6 
base recognition site and leaves a blunt end. Following 
digestion, oligonucleotide adapters are ligated to each 
end of the resulting genomic DNA fragments. 
[0438] For each of the five genomic DNA libraries, a 
first PGR reaction is performed according to the manu- 
facturer's instructions using an outeradapter primer pro- 
vided in the kit and an outer gene specific primer. The 
gene specific primer should be selected to be specific 
for 5* EST of interest and should have a melting temper- 
ature, length, and location in the EST-related nuclei ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids which is consistent with its use in PGR reactions. 
Each first PGR reaction contains 5ng of genomic DNA, 
5 |xl of 10X Tth reaction buffer, 0.2 mM of each dNTP, 
0. 2 |iM each of outer adapter primer and outer gene spe- 
cific primer, 1,1 mM of Mg(OAc)2. and 1 ^il of the Tth 
polymerase SOX mix in a total volume of 50 The re- 
action cycle for the first PGR reactbn is as follows: 1 
min at 94*'G / 2 sec at 94**G, 3 min at 72**C (7 cycles) / 
2 sec at 94**C, 3 min at e/^C (32 cycles) / 5 min at 67**C. 
[0439] The product ot the first PGR reaction is diluted 
and used as a template for a second PGR reaction ac- 
cording to the manufacturer's instructions using a pair 
of nested primers which are located internaify on the am- 
plicon resulting from the first PGR reaction. For exam- 
ple, 5 ^il of the reaction product of the first PGR reaction 
mixture may be diluted 180 times. Reactions are made 
in a 50 ^l volume having a composition identical to that 
of the first PGR reaction except th n st d primers are 
used. Th first nested primer is specific for th adapt r, 
and is provided with the GenomeWalker™ kit. The sec- 
ond nested primer is sp cific for the particular EST-r - 
lated nucleic acids, positional segments of EST-relat d 



nucleic acids or fragments of positional segm nts of 
EST-related nucleic acids for which the promot r is to 
be cloned and should have a melting temperature^ 
length, and location in the EST-related nucleic acids, po- 

5 sitional segments of EST-related nucleic acids or frag- 
ments of F>ositional segments of EST-related nucleic ac- 
ids which is consistent with its use in PGR reactions. 
The reaction parameters of the second PGR reaction 
are as follows: 1 min at 94°C / 2 sec at 94**G, 3 min at 

10 72'*G{6cycles)/2secat94**C, 3minat67'*G (25cycles) 
/ 5 min at - 67**G. The product of the second PGR reac- 
tion is purified, cloned, and sequenced using standard 
techniques. 

[0440] Alternatively, two or more human genomic 

^5 DNA libraries can be constructed by using two or more 
restrictbn enzymes. The digested genomic DNA is 
cloned into vectors which can be converted into single 
stranded, circular, or linear DNA. A bioliny lated oligonu- 
cleotide comprising at least 15 nucleotides from the 

20 EST-related nucleic acids, positional segments of EST- 
related nucleic acids or fragments of positional seg- 
ments of EST-related nucleic acids sequence is hybrid- 
ized to the single stranded DNA. Hybrids between th 
biotiny lated oligonucleotide and the single stranded 

25 DNA containing the EST-related nucleic acids, position- 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids ar 
isolated as described at>ove. Thereafter, the single 
stranded DNA containing the EST-related nucleic acids. 

30 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids is released from the beads and converted into 

- - cJouble stranded DNA using a primer specific for the 
EST-related nucleic acids, positional segments of EST- 

35 related niicleic acids or fragments of positional seg- 
ments of EST-related nucleic acids or a primer corre- 
sponding to a sequence included in the c toning vector. 
The resulting double stranded DNA is transformed into 
bacteria. cDNAs containing the EST-related nucleic ac- 

40 ids, positional segments of EST-related nuciek: acids or 
fragments of positional segments of EST-related nucleic 
acids are identified by colony PGR or colony hybridiza- 
tion. 

[0441] Once the upstream genomic sequences have 
45 been cloned and sequenced as described above, pro- 
spective promoters and transcriptton start sites within 
the upstream sequences may be identified by compar- 
ing the sequences upstream of the EST-related nuci ic 
acids, positional segments of EST-related nuclerc acids 
50 or fragments of positional segments of EST-related nu- 
cleic acids with databases containing known transcrip- 
tion start sites, transcriptk>n factor binding sites, or pro- 
moter sequences. 

[0442] In addition, promot rs in th upstream s - 
55 quences may be identifi d using promot rr port rv c- 
tors as d scribed in ExampI 53. 
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EXAMPLE 53 

Identification of Promoters in Ctoned Upstream 
Sequences 

[0443] The genomic sequences upstream of the EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of positional segments of 
EST-related nucleic acids are cloned into a suitable pro- 
moter reporter vector such as the pSEAP-Basic, 
pSEAP-Enhancer, pPgal-BasIc, ppgal -Enhancer, or 
pEGFP-1 Promoter Reporter vectors available from 
Clontech. Briefly, each of these promoter reporter vec- 
tors include multiple cloning sites positioned upstream 
of a reporter gene encoding a readily assayable protein 
such as secreted alkaline phosphatase, p galactosi- 
dase, or green fluorescent protein. The sequences up- 
stream of the EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or Iragments of po- 
sitional segments of EST-related nucleic acids are in- 
serted Into the cloning sites upstream of the reporter 
gene in both orientations and introduced into an appro- 
priate host celL The level of reporter protein is assayed 
and compared to the level obtained from a vector which 
lacks an insert In the cloning site. The presence of an 
elevated expression level in the vector containing the 
insert with respect to the control vector indicates the 
presence of a promoter in the insert. If necessary, the 
upstream sequences can be cloned into vectors which 
contain an enhancer for augmenting transcription levels 
from weak promoter sequences. A significant level of 
expression above that observed with the vector lacking 
an insert indicates that a promoter sequence is present 
in the Inserted upstream sequence. 
[0444] Appropriate host cells for the promoter reporter 
vectors may be chosen based on the results of the 
above described determination of expression patterns 
of the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids. For example, if 
the expression pattern analysis indicates that the mRN A 
corresponding to a particular EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids is expressed In fibroblasts, the promoter reporter 
vector may be introduced into a human fibroblast cell 
line. 

[0445] Promoter sequences within the upstream ge- 
nomic DNA may be further defined by constructing nest- 
ed deletions In the upstream DNA using conventional 
techniques such as Exonuclease III digestion. The re- 
sulting deletion fragments can be inserted into the pro- 
moter reporter vector to determine whether the deletion 
has reduced or obliterated promot r activity. In this way, 
th boundaries of the pronrK)ters may b defined. If de- 
sired, pot ntlal individual regulatory sites within the pro- 
moter may be identified using site directed mutagenesis 
or linker scanning to obliterate potential transcription 



factor binding sites within the promoter individually or in 
combination. The effects of these mutations on tran- 
scription levels may be determined by inserting the mu- 
tations into the cloning sites in the promoter reporter 
5 vectors. 

EXAMPLE 54 

Cloning and Identification of Promoters 

10 

[0446] Using the method described In Example 51 
above with 5' ESTs, sequences upstream of several 
genes were obtained. Using the primer pairs GGG AAG 
ATG GAG ATA GTATTG CCT G (SEQ ID NO: 15) and 

15 GTG CCA TGT ACA TGA TAG AGA GAT TC (SEQ ID 
NO: 16), the promoter having the Internal designation 
PI 3H2 (SEQ ID NO: 1 7) was obtained. 
[0447] Using the primer pairs GTA CCA GGGG ACT 
GTG ACC ATT GC (SEQ ID NO:18) and CTG TGACCA 

20 TTG CTC CCA AGA GAG (SEQ ID NO: 1 9), the promot- 
er having the internal designation P15B4 (SEQ ID NO: 
20) was obtained. 

[0448] Using the primer pairs CTG GGA TGG AAG 
GCA CGG TA (SEQ ID NO:21) and GAG ACC ACA 
25 CAG CTA GAC AA (SEQ ID NO:22), the promoter hav- 
ing the Internal designation P29B6 (SEQ ID NO:23) was 
obtained. 

[0449] Figure 4 provides a schematic description of 
the promoters isolated and the way they are assembled 

30 with the corresponding 5' tags. The upstream sequenc- 
es were screened for the presence of motifs resembling 
transcription factor binding sites or known transcription 
start sites using the computer program t^atlnspector re- 
lease 2.0, August 1996. 

35 [0450] Figure 5 describes the transcription factor 
binding sites present in each of these promoters. The 
columns labeled matrice provides the name of the Mat- 
Inspector matrix used. The column labeled position pro- 
vides the 5' position of the promoter site. Numeration of 

40 the sequence starts from the transcription site as deter- 
mined by matching the genomic sequence with the 5' 
EST sequence. The column labeled "orientation" indi- 
cates the DNA strand on which the site is found, with 
the + strand being the coding strand as determined by 

45 matching the genomic sequence with the sequence of 
the 5' EST The column labeled -score' provides the 
Matlnspector score found for this site. The column la- 
beled "length" provides the length of the site in nucle- 
otides. The column labeled "sequence" provides the se- 

50 quence of the site found. 

[0451] Bacterial clones containing plasmids contain- 
ing the promoter sequences described above described 
above are presently stored in the Inventor^s laboratories 
under th internal id ntification numbers provided 

55 abov . Th ins rts may be recover d from the deposit- 
ed materials by growing an aliquot of th appropriate 
bacterial clon in the appropriate medium. The plasmid 
DNA can then be isolated using plasmid isolation pro- 
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cedures familiar to those skilled in th art such as alka- 
line lysis minipreps or large scale alkalin lysis plasmtd 
isolation procedures. If desired the plasmid DNA may 
be further enriched by centrrfugation on a cesium chlo- 
ride gradient, size exclusion chromatography, or anion 
exchange chromatography. The plasmid DNA obtained 
using these procedures may then be manipulated using 
standard cbning techniques familiar to those skilled in 
the art. Alternatively, a PGR can be done with primers 
designed at both ends of the inserted EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids. The PGR product which corresponds 
to the EST-related nucleic acids, positional segments of 
EST-related nucleic acids or fragments of positional 
segments of EST-related nucleic acids can then be ma- 
nipulated using standard cloning techniques familiar to 
those skilled in the art. 

[0452] The promoters and other regulatory sequenc- 
es located upstream of the EST-related nucleic acids, 
positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids may be used to design expression vectors capa- 
ble of directing the expression of an inserted gene in a 
desired spatial, temporal, developmental, or quantita- 
tive manner A promoter capable of directing the desired 
spatial, temporal, developmental, and quantitative pat- 
terns may be selected using the results of the expres- 
sion analysis described above. For example, if a pro- 
moter which confers a high level of expression in muscle 
is desired, the promoter sequence upstream of EST-re- 
lated nucleic acids, positional segments of EST-related 
nucleic acids or fragments of positional segments of 
EST-related nucleic acids derived from an mRNA which 
are expressed at a high level in muscle, as determined 
by the methods above, may be used in the expression 
vector. 

[0453] Preferably, the desired promoter is placed near 
multiple restriction sites to facilitate the cloning of the 
desired insert downstream of the pronrK3ter, such that the 
pronrKJter is able to drive expression of the inserted 
gene. The promoter may be inserted in conventk)nal nu- 
cleic acid backbones designed for extrachromosomal 
replication, integration into the host chronrK)somes or 
transient expression. Suitable backbones for the 
present expression vectors include retroviral back- 
bones, backbones from eukaryotic episomes such as 
SV40 or Bovine Papilloma ViruS: backbones from bac- 
terial episomes, or artificial chromosomes. 
[0454] Preferably, the expression vectors also include 
a polyA signal downstream of the multiple restriction 
sites for directing the polyadenylation of mRNA tran- 
scribed from the gene inserted into the expression vec- 
tor. 

[0455] Following the identification of promot r se- 
quences using th procedures of Examples 51 -54, pro- 
teins which int ract with the promoter may be identified 
as described in Example 55 below. 



EXAMPLE 55 

Identification of Proteins Wh'ch Interact with Promoter 
Sequ nces. Upstream Regulatory Sequences, or 
5 mRNA 

[0456] Sequences within the promoter region which 
are likely to bind transcription factors may be identified 
by homology to known transcription factor binding sites 

10 or through conventional mutagenesis or deletion analy- 
ses of reporter plasm ids containing the promoter se- 
quence. For example, deletions may be made in a re- 
porter plasmid containing the pronnoter sequence of in- 
terest operably linked to an assayable reporter gene. 

IS The reporter plasmids carrying varbus deletions within 
the promoter region are transf ected into an appropriate 
host cell and the effects of the deletions on expression 
levels is assessed. Transcription factor binding sites 
within the regions in which deletions reduce expression 

20 levels may be further localized using site directed mu- 
tagenesis, linker scanning analysis, or other techniques 
familiar to those skilled in the art. 
[0457] Nucleic acids encoding proteins which interact 
with sequences in the promoter may be identified using 

25 one-hybrid systems such as those described in the man- 
ual accompanying the Matchmaker One-Hybrid System 
kit available from Clontech (Catalog No. K1 603-1). 
Briefly, the Matchmaker One-hybrid system is used as 
follows. The target sequence for which it is desired to 

30 identify binding proteins is cloned upstream of a selecta- 
ble reporter gene and integrated into the yeast genome. 
Preferably, multiple copies of the target sequences are 
inserted into the reporter plasmid in tandem. A library 
comprised of fusions between cDNAs to be evaluated 

35 for the ability to bind to the promoter and the activation 
domain of a yeast transcription factor, such as G AL4, is 
transformed into the yeast strain containing the integrat- 
ed reporter sequence. The yeast are plated on selective 
media to select cells expressing the selectable marker 

40 linked to the promoter sequence. The colonies which 
grow on the selective media contain genes encoding 
proteins which bind the target sequence. The inserts in 
the genes encoding the fusion proteins are further char- 
acterized by sequencing. In addition, the inserts may be 

45 inserted into expression vectors or in vitro transcription 
vectors. Binding of the polypeptides encoded by the in- 
serts to the promoter DNA may be confirmed by tech- 
niques familiar to those skilled in the art, such as gel 
shift analysis or DNAse protection analysis. 

so 

VII. Use of EST-related nucleic acids, positional 
segments of EST-related nucleic acids or fragmertts 
of positional segments of EST-related nucleic acids 
in Gene Th rapy 

55 

[0458] The pr sent inv ntion also compris s th use 
of EST-relat d nucleic acids, positional s gm nts of 
EST-related nucleic acids or fragments of positior^l 
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segments of EST-retated nucleic acids in gene therapy 
strat gies, including antisense and triple helix strategies 
as described in Examples 56 and 57 below. In antisense 
approaches, nucleic acid sequences complementary to 
an mRNA are hybridized to the mRNA intracellularly, 
thereby blocking the expression of the protein encoded 
by the mRNA. The antisense sequences may prevent 
- gene expression through a variety of mechanisms. For 
example, the antisense sequences may inhibit the abil- 
ity of ribosomes to translate the mRNA. Alternatively, the 
antisense sequences may block transport of the mRNA 
from the nucleus to the cytoplasm, thereby limiting the 
amount of mRNA available for translation. Another 
mechanism through which antisense sequences may 
inhibit gene expression is by interfering with mRNA 
splicing. In yet another strategy, the antisense nucleic 
acid may be incorporated in a ribozyme capable of spe- 
cifically cleaving the target mRNA. 

EXAMPLE 56 

Preparation and Use of Antisense Oligonucleotides 

[0459] The antisense nucleic acid molecules to be 
used in gene therapy may be either DNA or RNA se- 
quences. They may comprise a sequence complemen- 
tary to the sequence of the EST-related nucleic acids, 
positional segments of EST-reiated nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids. The antisense nucleic acids should have a length 
and melting temperature sufficient to permit formation 
of an intracellular duplex with sufficient stability to inhibit 
the expression of the mRNA in the duplex. Strategies 
for designing antisense nucleic acids suitable for use in 
gene therapy are disclosed in Green et ai, Ann. Rev. 
Biochem. 55:569-597 (1986) and Izant and Weintraub, 
Ce// 36: 1007-1 01 5 (1984). 

[0460] In some strategies, antisense molecules are 
obtained from a nucleotide sequence encoding a protein 
by reversing the orientation of the coding region with re- 
spect to a promoter so as to transcribe the opposite 
strand from that which is normally transcribed in the cell. 
The antisense molecules may be transcribed using in 
vitro transcription systems such as those which employ 
T7 or SP6 polymerase to generate the transcript. An- 
other approach involves transcription of the antisense 
nucleic acids in vivoby operably linking DNA containing 
the antisense sequence to a promoter in an expression 
vector. 

[0461] Altematively, oligonucleotides which are com- 
plementary to the strand normally transcribed in the cell 
may be synthesized in vitro. Thus, the antisense nucleic 
acids are complementary to the corresponding mRNA 
and ar capable of hybridizing to the mRNA to cr ate a 
duplex. In som embodiments, the antis nse sequenc- 
es may contain modified sugar phosphate backbones 
to increas stability and make them less sensitive to 
RNase activity. Examples of modifications suitable for 



use in antisense strategies are described by Rossi et 
at., Pharmacor Then S0(2):245-254, (1991). 
[0462] Various types of antisense oligonucleotides 
complementary to the sequence of the EST-related nu- 

5 oleic acids, positional segments of EST-related nucleic 
acids or fragments of positional segments of EST-relat- 
ed nucleic acids may be used. In one preferred embod- 
iment, stable and semi-stable antisense oligonucle- 
otides described in International Application No. PCT 

10 W094/23026 are used. In these nrK>lecules, the 3' end 
or both the 3' and 5' ends are engaged in intramolecular 
hydrogen bonding between complementary base pairs. 
These molecules are better able to withstand exonucte- 
ase attacks and exhibit increased stability compared to 
conventional antisense oligonucleotides. 
[0463] In another preferred embodiment, the anti- 
sense oligodeoxy nucleotides against herpes simplex vi- 
rus types 1 and 2 described in International Application 
No. WO 95/04141 are used. 

20 [0464] In yet another preferred embodiment, the cov- 
alently cross-linked antisense oligonucleotides de- 
scribed in International Application No. WO 96/31523 
are used. These double- or single-stranded oligonucle- 
otides comprise one or more, respectively, Inter- or intra- 

25 oligonucleotide covalent cross-linkageS: wherein the 
linkage consists of an amide bond between a primary 
amine group of one strand and a carboxyl group of the 
other strand or of the same strand, respectively, the pri- 
mary amine group being directly substituted in the 2' po- 

30 sition of the strand nucleotide monosaccharide ring, and 
the carboxyl group being carried by an aliphatic spacer 
group substituted on a nucleotide or nucleotide analog 
of the other strand or the same strand, respectively. 
[0465] The antisense oligodeoxy nucleotides and oli- 

35 gonucleotides disclosed in International Application No. 
WO 92/1 8522 nnay also be used. These nnolecules are 
stable to degradation and contain at least one transcrip- 
tion control recognition sequence which binds to control 
proteins and are effective as decoys therefor. These 

40 molecules may contain "hairpin" structures, "dumbbell" 
structures, "nrKxJified dumbbell" structures, "cross- 
linked" decoy structures and "loop" structures. 
[0466] In another preferred embodiment, the cyclic 
double-stranded oligonucleotides described in Europe- 
's an Patent Application No. 0 572 287 A2. These ligated 
oligonucleotide "dumbbells" contain the binding site for 
a transcription factor and inhibit expression of the gene 
under control of the transcription factor by sequestering 
the factor. 

so [0467] Use of the closed antisense oligonucleotid s 
disclosed in International Application No. WO 92/19732 
is also contemplated. Because these molecules have 
no free ends, they are more resistant to degradation by 
exonucleases than ar conventional oligonucleotides. 

ss These oligonucleotides may be multifunctional, interact- 
ing with sev ral r gions which ar not adjac nt to the 
target mRNA. 

[0468] The appropriate lev I of antisens nucleic ac- 
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ids required to inhibit gen expr sston may be d t r- 
mined using in vitro expression analysis. Th antisens 
molecule may be introduced into the cells by diffusion, 
injection, infection or transfection using procedures 
known in the art. For example^ the antisense nucleic ac- 
ids can be introduced into the body as a bare or naked 
oligonucleotide, oligonucleotide encapsulated in lipid, 
oligonucleotide sequence encapsidated by viral protein, 
or as an oligonucleotide operably linked to a promoter 
contained in an expression vector The expression vec- 
tor may be any of a variety of expression vectors known 
in the art, including retroviral or viral vectors, vectors ca- 
pable of extrachromosomal replication, or integrating 
vectors. The vectors may be DNA or RNA. 
[0469] The antisense molecules are introduced onto 
cell samples at a number of different concentrations 
preferably between 1xlO-""<*Mtolx1 O^M. Once the m in- 
imum concentration that can adequately control gene 
expression is identified, the optimized dose is translated 
into a dosage suitable lor use in vivo. For example, an 
inhibiting concentration in culture of 1x10^^ translates in- 
to a dose of approximately 0.6 mg/kg bodyweight. Lev- 
els of oligonucleotide approaching 100 mg/kg body- 
weight or higher maybe possible after testing the toxicity 
of the oligonucleotide in laboratory animals. It is addi- 
tionally contemplated that cells from the vertebrate are 
removed, treated with the antisense oligonucleotide, 
and reintroduced into the vertebrate. 
[0470] It is further contemplated that the antisense ol- 
igonucleotide sequence is incorporated into a ribozyme 
sequence to enable the antisense to specifically bind 
and cleave its target mRNA. For technical applications 
of ribozyme and antisense oligonucleotides see Rossi 
et aL, supra. 

[0471] In a preferred application of this inventbn, the 
polypeptide encoded by the gene is first identified, so 
that the effectiveness of antisense inhibition on transla- 
tion can be monitored using techniques that include but 
are not limited to antibody-mediated tests such as Rl As 
and ELISA, functional assays, or radiolabeling. 
[0472] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used in gene therapy approaches based on intracel- 
lular triple helix formation. Triple helix oligonucleotides 
are used to inhibit transcription from a genome. They 
are particularly useful for studying alterations in cell ac- 
tivity as it is associated with a particular gene. The EST- 
related nucleic acids, positional segments of EST-relat- 
ed nucleic acids or fragments of posit k)nal segments of 
EST-related nucleic acids of the present invention or, 
more preferably, a portion of those sequences, can be 
used to inhibit gene expression in individuals having dis- 
as s associat d with xpression of a partrcular gen . 
Similariy, the EST-r lated nuci ic acids, positional seg- 
ments of EST-relat d nucleic acids or fragments of po- 
sitional segments of EST-r lated nucleic acids can be 
used to study the effect of inhibiting transcription of a 



particular gene within a cell. Traditionally, homopurine 
sequences wer consid red the most useful for triple 
helix strategi s. However, homopyrimidine sequences 
can also inhibit gene expression. Such homopyrimidine 

5 oligonucleotides bind to the major groove at homopu- 
rine:homopyrimidine sequences. Thus, boXh types of 
sequences from the EST-related nucleic acids, position- 
al segments of EST-related nucleic acids or fragments 
of positional segments of EST-related nucleic acids are 

TO contemplated within the scope of this invention. 

EXAMPLE 57 

Preparation and use of Triple Helix Probes 

75 

[0473] The sequences of the EST-related nucleic ac- 
ids, positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids are scanned to identify 1 0-mer to 20-mer homopy- 

20 rimidine or homopurine stretches which could be used 
in triple-helix based strategies for inhibiting gene ex- 
pression. Following identification of candidate homopy- 
rimidine or homopurine stretches, their efficiency in in- 
hibiting gene expression is assessed by introducing var- 

2B ying amounts of oligonucleotides containing the candi- 
date sequences into tissue culture cells which normally 
express the target gene. The oligonucleotkies may be 
prepared on an oligonucleotide synthesizer or they may 
be purchased commercially from a company specializ- 

30 ing in custom oligonucleotide synthesis, such as 
GENSET. Paris, France. 

[0474] The oligonucleotides may be introduced into 
the cells using a variety of methods known to those 
skilled in the art, including but not limited to calcium 

35 phosphate precipitation. DEAE-Dextran, electropora- 
tion, liposome-mediated transfection or native uptake. 
[0475] Treated cells are monitored for altered cell 
function or reduced gene expression using techniques 
such as Northern btotting, RNase protection assays, or 

40 PGR based strategies to mon itor the transcription levels 
of the target gene in cells which have been treated with 
the oligonucleotide. The cell functk>ns to be monitored 
are predicted based upon the homologies of the target 
genes corresponding to the EST-related nucleic acids, 

46 positional segments of EST-related nucleic acids or 
fragments of positional segments of EST-related nucleic 
acids from which the oligonucleotide were derived with 
known gene sequences that have been associated with 
a particular function. The cell functions can also be pre- 

50 dieted based on the presence of abnormal physk^logies 
within cells derived from individuals with a particular in- 
herited disease, particularly when the EST-related nu- 
cleic acids, positional segments of EST-related nucleic 
acids or fragments of positional s gments of EST-r lat- 

55 ed nucleic acids ar associated with the diseas using 
techniques described herein. 

[0476] The oligonucleotides which are effective in in- 
hibiting gene xpression in tissue culture cells may th n 
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be introduced in wVo using the techniques described 
above and in Example 56 at a dosag calculated based 
on the in vitro results, as described in Exanriple 56. 
[0477] In some embodiments, the natural (beta) ano- 
mers ot the oligonucleotide units can be replaced with 
alpha anomers to render the oligonucleotide more re- 
sistant to nucleases. Further, an intercalating agent 
^such as ethldium bromide, or the like, can be attached 
to the 3' end of the alpha oligonucleotide to stabilize the 
triple helix. For information on the generation of oligo- 
nucleotides suitable for triple heiix formation see Griffin 
etai (Sc/ence 245:967-971 (1989)). 

EXAMPLE 58 

Use of EST-related nucleic acidS; positional segments 
of EST-related nucleic acids or fragments of positional 

" segments of EST-related nucleic acids to express an 

■ Encoded Protein in a Host Organism 

[0478] The EST-related nucleic acids, positional seg- 
ments of EST-related nucleic acids or fragments of po- 
sitional segments of EST-related nucleic acids may also 
be used to express an encoded protein or polypeptide 
in a host organism to produce a beneficial effect. In ad- 
dition, nucleic acids encoding the EST-related polypep- 
tides, positional segments of EST-related polypeptides 
or fragments of positional segments of EST-related 
polypeptides may be used to express the encoded pro- 
tein or polypeptide in a host organism to produce a ben- 
eficial effect. 

[0479] In such procedures, the encoded protein or 
polypeptide may be transiently expressed in the host or- 
ganism or stably expressed in the host organism. The 
encoded protein or polypeptide may have any of the ac- 
tivities described above. The encoded protein or 
polypeptide may be a protein or polypeptide which the 
host organism lacks or, alternatively, the encoded pro- 
tein may augment the existing levels of the protein in the 
host organism. 

[0480] In some embodiments in which the protein or 
polypeptide is secreted, nucleic acids encoding the full 
length protein (i.e. the signal peptide and the mature 
protein), or nucleic acids encoding only the mature pro- 
tein (i.e. the protein generated when the signal peptide 
is cleaved off) is introduced into the host organism. 
[0481] The nucleic acids encoding the proteins or 
polypeptides may be introduced into the host organism 
using a variety of techniques known to those of skill in 
the art. For example, the extended cDN A may be inject- 
ed into the host organism as naked DNA such that the 
encoded protein is expressed in the host organism, 
thereby producing a beneficial effect. 
[0482] Allematively, the nuci ic acids encoding the 
protein or polypeptide nr^y be cloned into an expression 
vector downstream of a promot r which is activ in the 
host organism The expression vector nr>ay be any of the 
expression vectors designed for use in g ne therapy. 
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including viral or r troviralv ctors^The expression v c- 
tor may be directly introduced into the host organism 
such that the encoded protein is expressed in the host 
organism to produce a beneficial effect. In another ap- 
5 proach, the expression vector may be introduced into 
cells in vitro. Cells containing the expression vector are 
thereafter selected and introduced into the host organ- 
ism, where they express the encoded protein or 
polypeptide to produce a beneficial effect. 

10 

EXAMPLE 59 

Use of Signal Peptides To Import Proteins Into Cells 

IS [0483] The short core hydrophobic region (h) of signal 
peptides encoded by the sequences of SEQ ID NOs: 
24-652 and 3721 -381 1 may also be used as a carrier to 
import a peptide or a protein of interest, so-called cargo, 
into tissue culture cells (Lin et ai, J. Biol. Chem., 270: 

20 14225-14258 (1995); Du ef a/., J. Peptide Res., 51: 
235-243 (1998); Rojas et at.. Nature Biotech., 16: 
370-375(1998)). 

[0484] When cell permeable peptides of limited siz 
(approximately up to 25 amino acids) are to be translo- 

25 cated across cell membrane, chemical synthesis may 
be used in order to add the h region to either the C-ter- 
minus or the N-terminus to the cargo peptide of interest. 
Alternatively, when longer peptides or proteins are to b 
imported into cells, nucleic acids can be genetically en- 

30 gineered, using techniques familiar to those skilled in 
the art, in order to link the extended cDNA sequence 
encoding the h region to the 5' or the 3' end of a DNA 
sequence coding for a cargo polypeptide. Such geneti- 
cally engineered nucleic acids are then translated either 

35 in vitro or in vivo afte r transf ection into appropriate ce I Is, 
using conventional techniques to produce the resulting 
cell permeable polypeptide. Suitable hosts cells are 
then simply incubated with the cell permeable polypep- 
tide which is then translocated across the membrane. 

40 [0485] This method may be applied to study diverse 
intracellular functions and cellular processes. For in- 
stance, it has been used to probe functionally relevant 
domains of intracellular proteins and to examine protein- 
protein interactions involved in signal transduction path- 

^ ways (Lin et al supra; Lin et ai, J. Biol Chen)., 271: 
5305-5308 (1996); Rojas et ai, J. Biol. Chem., 271: 
27456-27461 (1996); Liu etai, Proc. Natl, Acad. Sci. 
USA, 93: 11819-11824 (1996); Rojas etai., Bioch, Bio- 
phys. Res. Commua, 234: 675-680 (1 997)), 

so [0486] Such techniques may be used in cellular ther- 
apy to import proteins producing therapeutic effects. For 
instance, cells isolated from a patient may be treated 
with imported therapeutic proteins and then re-intro- 
duced into th host organism. 

55 [0487] Alternatively, the h r g ion of signal p ptides of 
th present inv ntion could be used in combination with 
a nuclear localization signal to deliver nucleic acids into 
cell nucleus. Such oligonucleotides may be antisense 
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oligonuci otides or oligonucleotides designed to form 
tripl helixes, as describedabove. in ord r to inhibit 
processing and maturation of a target cellular RNA. 

EXAMPLE 60 

Computer Embodiments 

[0488] As used herein the term "nucleic acid codes of 
SEQ ID NOs: 24-4100 and 8178-36681' encompasses 
the nucleotide sequences of SEQ ID NOs: 24-4100 and 
8178-36681, fragments of SEQ ID NOs: 24-4100 and 
8178-36681. nucleotide sequences homologous to 
SEQ ID NOs: 24-4100 and 8178-36681 or homologous 
tofragmentsof SEQ ID NOs: 24-4100 and 81 78-36681, 
and sequences complementary to all of the preceding 
sequences. The fragments include portions of SEQ ID 
NOs: 24-4100 and 8178-36681 comprising at least 10. 
15. 20. 25. 30, 35, 40. 50. 75, 100. 150. 200, 300. 400. 
or 500 consecutive nucleotides of SEQ ID NOs: 24-41 00 
and 8178-36681. Preferably, the fragments are novel 
fragments. Homologous sequences and fragments of 
SEQ ID NOs: 24-4100 and 8178-36681 refer to a se- 
quence having at least 99%. 98%, 97%, 96%, 95%, 
90%, 85%. 80%, or 75% homology to these sequences. 
Homology may be determined using any of the compu- 
ter programs and parameters described In Example 18, 
including BLAST2N with the default parameters or with 
any modified parameters. Homologous sequences also 
include RNA sequences in which uridines replace the 
thymines in the nucleic acid codes of SEQ ID NOs: 
24-4100 and 8178-36681 . The homologous sequences 
may be obtained using any of the procedures described 
herein or may result from the correction of a sequencing 
error as described above. It will be appreciated that the 
nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 can be represented in the traditional single 
character format (See the inside back cover of Starrier, 
Lubert. Biochemistry, 3^^ edition. W. H Freeman & Co.. 
New York.) or in any other format which records the iden- 
tity of the nucleotides in a sequence. 
[0489] As used herein the term •polypeptide codes of 
SEQ ID NOs: 4101*8177" encompasses the polypep- 
tide sequence of SEQ ID NOs: 4101-8177 which are en- 
coded by the 5' EST s of SEQ ID NOs: 24-4100 and 
8178-36681 , polypeptide sequences homologous to the 
polypeptides of SEQ ID NOs: 4101-8177. or fragments 
of any of the preceding sequences. Homologous 
polypeptide sequences refer to a polypeptide sequence 
having at least 99%. 98%. 97%, 96%, 95%. 90%. 85%. 
80%. 75% homology to one of the polypeptide sequenc- 
es of SEQ ID NOs: 4101-8177. Homology may be de- 
termined using any of the computer programs and pa- 
ramet rs describ d h rein, including FASTA with th 
default param ters or with any modifi d param t rs. 
The honnologous sequ nc s may be obtained using any 
of the procedures describ d herein or may result from 
the correction of a sequencing rror as described above. 



The polypeptide fragments comprise at least 5. 10. 15. 
20, 25. 30, 35. 40. 50, 75. 1 00. or 1^50 consecutive amino 
acids of the polypeptides of SEQ ID NOs: 4101-8177. 
Preferably, the fragments are novel fragments. It will be 

s appreciated that the polypeptide codes of the SEQ ID 
NOs: 4101-8177 can be represented in the traditional 
single character format or three letter format (See the 
inside back cover of Starrier. Lubert. Biochem^try, ^ 
edition. W. H Freeman & Co.. New York.) or in any other 

10 format which relates the identity of the polypeptides in 
a sequence. 

[0490] It will be appreciated by those skilled in the art 
that the nucleic acid codes of SEQ ID NOs: 24-41 00 and 
8178-36681 and polypeptide codes of SEQ ID NOs: 

IS 4101-8177 can be stored, recorded, and manipulated 
on any medium which can be read and accessed by a 
connputer. As used herein, the words "recorded" and 
"stored" refer to a process for storing information on a 
conrputer medium. A skilled artisan can readily adopt 

20 any of the presently known methods for recording infor- 
mation on a computer readable medium to generate 
manufactures comprising one or more of the nucleic ac- 
kJ codes of SEQ ID NOs: 24-4100 and 81 78-36681 . one 
or more of the polypeptide codes of SEQ ID NOs: 

2S 4101-8177. Another aspect of the present invention is 
a computer readable medium having recorded thereon 
at least 2, 5, 10. 15, 20. 25, 30. or 50 nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 . Another as- 
pect of the present invent ion is a computer readable me- 

30 dtum having recorded thereon at least 2, 5, 10, 15, 20, 
25. 30. or 50 polypeptide codes of SEQ ID NOs: 
4101-8177. 

[0491] Computer readable media include magnetical- 
ly readable media, optrcally readable media, electroni- 
cs cally readable media and magnetic/optical media For 
example, the computer readable media may be a hard 
disc, a floppy disc, a magnetic tape, CD-ROM, DVD, 
RAM, or ROM as well as other types of other media 
known to those skilled in the art. 
40 [0492] Embodiments of the present invention include 
systems, particularly computer systems which contain 
the sequence informatran described herein. As used 
herein, "a computer system" refers to the hardware 
components, software components, and data storage 
45 components used to analyze the nucleotide sequences 
of the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681, or the amino acid sequences of the 
polypeptide codes of SEQ ID NOs: 4101-8177. The 
computer system preferably includes the comput r 
so readable media described above, and a processor for 
accessing and manipulating the sequence data. 
[0493] Preferably, the computer is a general purpos 
system that comprises a central processing unit (CPU), 
one or more data storag components for storing data, 
ss and one or more data retrieving devices for r trieving 
the data stored on the data storage components. A 
skilled artisan can readily appreciat that any one of the 
currently available computer systems ar suitable. 
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[0494] In one particular embodiment, th computer 
system includes a processor connected to a bus which 
is connected to a main memory (preferabfy impi ment- 
ed as RAM) and one or more data storage devices, such 
as a hard drive and/or other computer readable media 
having data recorded thereon. In some embodiments, 
the computer system further includes one or more data 
retrieving devices for reading the data stored on the data 
storage components. The data retrieving device may 
' represent, for example, a floppy disk drive, a compact 
disk drive, a magnetic tape drive, etc. In some embodi- 
ments, the data storage component is a removable com- 
puter readable medium such as a floppy disk, a compact 
disk, a magnetic tape, etc. containing control logic and/ 
or data recorded thereon. The computer system may 
^advantageously include or be programmed by appropri- 
ate software for reading the control logic and/or the data 
from the data storage component once inserted in the 

• data retrieving device. Software for accessing and 
processing the nucleotide sequences of the nucleic acid 
codes of SEQ ID NOs: 24-4100 and 8178-36681, or the 
amino acid sequences of the polypeptide codes of SEQ 
ID NOs: 4101-8177 (such as search tools, compare 
tools, and modeling tools etc.) may reside in main mem- 
ory during execution. 

[0495] In some embodiments, the computer system 
may further comprise a sequence comparer for compar- 
ing the above-described nucleic acid codes of SEQ ID 
NOs: 24-4100 and 8178-36681 or polypeptide codes of 
SEQ ID NOs: 4101 -8177 stored on a computer readable 
medium to reference nucleotide or polypeptide se- 
quences stored on a computer readable medium. A "se- 
quence comparer" refers to one or more programs 
which are implemented on the computer system to com- 
pare a nucleotide or polypeptide sequence with other 
nucleotide or polypeptide sequences and/or com- 
pounds including but not limited to peptides, peptidomi- 
metics, and chemicals stored within the data storage 
means. For example, the sequence comparer may com- 

• pare thie nucleotide sequences of the nucleic acid codes 
of SEQ ID NOs: 24-4100 and 81 78-36681 , or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOs: 4101-8177 stored on a computer readable medi- 
um to reference sequences stored on a computer read- 
able medium to identify homologies, motifs implicated 
in biological function, or structural motifs. The various 
sequence comparer programs identified elsewhere in 
this patent specification are particularly contemplated 
for use in this aspect of the invention. 

[0496] Accordingly, one aspect of the present inven- 
tion is a computer system comprising a processor, a da- 
ta storage device having stored thereon a nucleic acid 
code of SEQ ID NOs: 24-4100 and 8178-36681 or a 
polyp ptide cod of SEQ ID NOs: 4101-8177, a data 
storag d vice having r trievably stor d th r on refer- 
nce nucleotide sequ nces or polyp ptid sequ nces 
to be compared to the nucleic acid code of SEQ I D NOs: 
24-4100 and 8178-36681 or polypeptid code of SEQ 



ID NOs: 41 01 -81 and a sequence comparer for con- 
ducting the comparison. The sequenc comparer may 
indicate a homology level between the sequences com- 
pared or identify structural motifs in the above described 

5 nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681 and polypeptide codes of SEQ ID NOs: 
4101-8177 or it may identify structural motifs in se- 
quences which are compared to these nucleic acid 
codes and polypeptide codes. In some embodiments, 

10 the data storage device may have stored thereon the 
sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of 
the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 or polypeptide codes of SEQ ID NOs: 
4101-8177. 

IS [0497] Another aspect of the present invention is a 
method for determining the level of homology between 
a nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681 and a reference nucleotide sequence, 
comprising the steps of reading the nucleic acid code 

20 and the reference nucleotide sequence through the use 
of a computer program which determines homology lev- 
els and determining homology between the nucleic acid 
code and the reference nucleotide sequence with the 
computer program. The computer program may be any 

25 of a number of computer programs for determining ho- 
mology levels, including those specifically enumerated 
herein, including BLAST2N with the default parameters 
or with any modified parameters. The method may be 
implemented using the computer systems described 

30 above. The method may also be performed by reading 
2, 5, 10, 15. 20. 25, 30, or 50 of the above described 
nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 through use of the computer program and 
determining homology between the nucleic acid codes 

35 and reference nucleotide sequences . 

[0498] Alternatively, the computer program may be a 
computer program which compares the nucleotide se- 
quences of the nucleic acid codes of the present inven- 
tion, to reference nucleotide sequences in order to de- 

40 termine whether the nucleic acid code of SEQ ID NOs: 
24-4100 and 8178-36681 differs from a reference nu- 
cleic acid sequence at one or more positions. Optionally 
such a program records the length and identity of insert- 
ed, deleted or substituted nucleotides with respect to the 

45 sequence of either the reference polynucleotide or the 
nucleic acid code of SEQ ID NOs: 24-4100 and 
8178-36681. In one emt50diment. the computer pro- 
gram may be a program which determines whether the 
nucleotide sequences of the nucleic acid codes of SEQ 

50 ID NOs: 24-4100 and 8178-36681 contain a single nu- 
cleotide polymorphism (SNP) with respect to a refer- 
ence nucleotide sequence. This single nucleotide poly- 
morphism may comprise a single base substitution, in- 
sertion, or del tk>n. 

55 [0499] Another aspect of th present invention is a 
method for determining the level of homology b tween 
a polypeptide code of SEQ ID NOs: 4101-8177 and a 
r ference polypeptid s qu nee, comprising the steps 
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of reading th polypepttd cod of SEQ ID NOs: 
4101-8177 and the reference polypeptide sequence 
through use of a computer program which determines 
homology levels and determining homology between 
the polypeptide code and the reference polypeptide se- 
quence using the computer program. 
[0500] Accordingly, another aspect of the present in- 
vention is a method for determining whether a nucleic 
acid code of SEQ ID NOs: 24-4100 and8178-36681 dif- 
fers at one or more nucleotides from a reference nucle- 
otide sequence comprising the steps of reading the nu- 
cleic acid code and the reference nucleotide sequence 
through use of a computer program which identifies dif- 
ferences between nucleic acid sequences and identify- 
ing differences between the nucleic acid code and the 
reference nucleotide sequence with the computer pro- 
gram. In some embodiments, the computer program is 
a program which identifies single nucleotide polymor- 
phisms. The method may be Implemented by the com- 
puter systems described above. The method may also 
be performed by reading at least 2, 5, 10, 15, 20, 25, 30, 
or 50 of the nucleic acid codes of SEQ ID NOs: 24-41 00 
and 8178-36681 and the reference nucleotide sequenc- 
es through the use of the computer program and iden- 
tifying differences between the nucleic acid codes and 
the reference nucleotide sequences with the computer 
program. 

[0501] In other embodiments the computer based 
system may further comprise an identifier for identifying 
features within the nucleotide sequences of the nucleic 
acid codes of SEQ ID NOs: 24-4100 and 8178-36681 
or the amino acid sequences of the polypeptide codes 
of SEQ ID NOs: 4101-8177. 

[0502] An "identifier" refers to one or more programs 
which identifies certain features within the above-de- 
scribed nucleotide sequences of the nucleic acid codes 
of SEQ ID NOs: 24-4100 and 8178-36681 or the amino 
acid sequences of the polypeptide codes of SEQ ID 
NOs: 4101-8177. In one embodiment, the identifier may 
comprise a program which identifies an open reading 
frame in the cDNAs codes of SEQ ID NOs: 24-4100 and 
8178-36681. 

[0503] In another embodimenti the identifier may 
comprise a molecular modeling program which deter- 
mines the 3-dimensional structure of the polypeptides 
codes of SEQ ID NOs: 4101-8177. In some embodi- 
ments, the molecular modeling program identifies target 
sequences that are most compatible with profiles repre- 
senting the structural environments of the residues in 
known three-dimensional protein structures. (See, e.g., 
Eisenberg et al., U.S. Patent No. 5,436,850 issued July 
25, 1995). In another technique, the known three-di- 
mensional structures of proteins in a given family are 
sup rimposed to d fin th structurally cons rved r - 
gions in that family. This protein modeling technique al- 
so us s the known three-dimensional structure of a ho- 
motogous protein to approximat the structure of the 
polypeptide codes of SEQ ID NOs: 4101-8177. (See e. 



g.. Srinivasan, et al., U.S. Pat nt No. 5,557,535 issued 
September 17, 1996). Conventional homology mode- 
ling techniques have been used routinely to build mod- 
els of proteases and antibodies. (Sowdhamini et al.. 

s Protein Engineering 10:207, 215 (1997)). Comparative 
approaches can also be used to develop three-dimen- 
sional protein models when the protein of interest has 
poor sequence identity to template proteins. In some 
cases, proteins fold into similar three-dimensional struc- 

10 tures despite having very weak sequence identities. For 
example, the three-dimensional structures of a number 
of helical cytokines fold in similar three-dimensional to- 
pology in spite of weak sequence homology. 
[0504] The recent development of threading methods 

IS now enables the identification of likely folding patterns 
in a number of situations where the structural related- 
ness between target and template(s) is not detectable 
at the sequence level. Hybrid methods, in which fold rec- 
ognition is performed using Multiple Sequence Thread- 

20 ing (MST), structural equivalencies are deduced from 
the threading output using a distance geometry program 
DRAGON to construct a low resolution model, and a full- 
atom representation is constructed using a molecular 
modeling package such as QUANTA. 

2S [0505] According to this 3-step approach, candidate 
templates are first identified by using the novel fold rec- 
ognition algorithm MST, which is capable of performing 
simultaneous threading of. multiple aligned sequences 
onto one or more 3-D structures. In a second step, the 

30 structural equ'rvalencies obtained from the MST output 
are converted into interresidue distance restraints and 
fed into the distance geometry program DRAGON, to- 
gether with auxiliary infomnation obtained from second- 
ary structure predictions. The program combines the re- 

35 straints in an unbiased manner and rapidly generates a 
large number of low resolution model confirmations. In 
a third step, these low resolution model confirmations 
are converted into full-atom models and subjected to en- 
ergy minimization using the molecular modeling pack- 

40 age QUANTA. (See e.g., Asz6di et al.. Proteins: Struc- 
ture, Function, and Genetics, Supplement 1:38-42 
(1997)). 

[05(^] The results of the molecular modeling analysis 
nnay then be used in rational drug design techniques to 

45 identify agents which modulate the activity of the 
polypeptide codes of SEQ ID NOs: 4101-8177. 
[0507] Accordingly, another aspect of the present in- 
vention is a method of identifying a feature within the 
nucleic acid codes of SEQ ID NOs: 24-4100 and 

so 8178-36681 or the polypeptide codes of SEQ ID NOs: 
41 01 -81 77 comprising reading the nucleic acid cod (s) 
or the polypeptide code(s) through the use of a compu- 
ter program which identifies features therein and iden- 
tifying features within th nucleic acid code(s) or 

ss polyp ptide cod (s) with th comput r program. In on 
embodiment, computer program comprises a comput r 
program which identifies open r ading frames. In a fur- 
ther embodiment, th comput r program identifies 
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structural motifs in a polypeptide sequence. In another 
embodiment, the computer program comprises a mo- 
lecular modeling program. The method may be p r- 
formed by reading a single sequence or at least 2, 5, 1 0, 
15, 20, 25, 30, or 50 of the nucleic acid codes of SEQ 
ID NOs: 24-4100 and 8178-36681 or the polypeptide 
codes of SEQ ID NOs: 4101-8177 through the use of 
the computer program and identifying features within 
the nucleic acid codes or polypeptide codes with the 
computer program. 

[0508] The nucleic acid codes of SEQ ID NOs: 
24-4100 and 8178-36681 or the polypeptide codes of 
SEQ ID NOs: 41 01 -81 77 may be stored and manipulat- 
ed in a variety of data processor programs in a variety 
of formats: For example, the nucleic acid codes of SEQ 
ID NOs: 24-4100 and 8178-36681 or the polypeptide 
codes of SEQ ID NOs: 4101 -81 77 may be stored as text 
in a word processing file, such as MicrosoftWORD or 
WORDPERFECT or as an ASCII file in a variety of da- 
tabase programs familiar to those of skill in the art, such 
as DB2, SYBASE, or ORACLE. In addition, many com- 
puter programs and databases may be used as se- 
quence comparers, identifiers, or sources of reference 
nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of SEQ ID NOs: 24-4100 and 
8178-36681 or the polypeptide codes of SEQ ID NOs: 
4101-8177. The following list is intended not to limit the 
invention but to provide guidance to programs and da- 
tabases which are useful with the nucleic acid codes of 
SEQ ID NOs: 24-4100 and 8178-36681 or the polypep- 
tide codes of SEQ ID NOs: 4101-8177. The programs 
and databases which may be used include, but are not 
limited to: MacPattern (EMBL), DiscoveryBase (Molec- 
ular Applications Group). GeneMine (Molecular Appli- 
cations Group), Look (Molecular Applications Group), 
MacLook (Molecular Applications Group), BLAST and 
BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al, 
J. MoL Biol. 215: 403 (1990)), FASTA (Pearson and Lip- 
man, Proc. Natl Acad, ScL USA, 85: 2444 (1988)), 
FASTDB (Brutlag et al. Comp. App. Biosci. 6:237-245. 
1990), Catalyst (Molecular Simulations Inc.). Catalyst/ 
SHAPE (Molecular Simulations Inc.), Cerius^.DBAc- 
cess (Molecular Simulations Inc.), HypoGen (Molecular 
Simulations Inc.), Insight II, (Molecular Simulations 
Inc.), Discover (Molecular Simulations Inc.), CHARMm 
(Molecular Simulations Inc.), Felix (Molecular Simula- 
tions Inc.), DelPhi, (Molecular Simulations Inc.), 
QuanteMM, (Molecular Simulations Inc.), Homology 
(Molecular Simulations Inc.), Modeler (Molecular Simu- 
lations Inc.), ISIS (Molecular Simulations Inc.). Quanta/ 
Protein Design (Molecular Simulations Inc.), WebLab 
(Molecular Simulations Inc.). WebLab Diversity Explor- 
er (Molecular Simulations Inc.), Gene Explorer (Molec- 
ular Simulations Inc.), S qFold (Molecular Simulations 
Inc.), the EMBL/Swissprotein database, th MDL Avail- 
able Ch micals Directory database, the MDL Drug Data 
Report data base, the Comprehensive Medicinal Chem- 
istry database. Derwents*s World Drug Ind x database, 
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the BioByteMasterFile database^ the G nbank data- 
base, andtheGenseqn database. Many other programs 
and data bases would be apparent to one of skill in th 
art given the present disclosure. 

5 [0509] Motifs which may be detected using the above 
programs include sequences encoding leucine zippers, 
helix-turn-helix motifs, glycosylation sites, ubiquitination 
sites, alpha helices, and beta sheets, signal sequences 
encoding signal peptides which direct the secretion of 

10 the encoded proteins, sequences implicated in tran- 
scription regulation such as homeoboxes, acidic 
stretches, enzymatic active sites, substrate binding 
sites, and enzymatic cleavage sites. 

IS EXAMPLE 61 

Methods of Making Nucleic Acids 

[0510] The present invention also comprises methods 

20 of making the EST-related nucleic acids, fragments of 
EST- related nucleic acids, positional segments of. the 
EST-related nucleic acids, or fragments of positional 
segments of the EST-related nucleic acids. The meth- 
ods comprise sequentially linking together nucleotides 

2S . to produce the nucleic acids having the preceding se- 
quences. A variety of methods of synthesizing nucleic 
acids are known to those skilled in the art. 
[0511] In many of these methods, synthesis is con- 
ducted on a solid support. These included the 3* phos- 

30 phoramidite methods in which the 3* terminal base of 
the desired oligonucleotide is immobilized on an insol- 
- ubie carrier. The nucleotide base to be added is blocked 

- - at the 5' hydroxyl and activated at the 3' hydroxyl so as 
to cause coupling with the immobilized nucleotide base. 

3S Deblocking of the new immobilized nucleotide com- 
pound and repetition of the cycle will produce the de- 
sired polynucleotide. Alternatively, polynucleotides may 
be prepared as described in U.S. Patent No. 5,049,656. 
In some embodiments, several polynucleotides pr - 

^0 pared as described above are ligated together to gen- 
erate longer polynucleotides haying a desired se- 
quence. 

EXAMPLE 62 

45 

Methods of Making Poivpeptides 

[051 2] The present invention also comprises methods 
of making the polynucleotides encoded by EST-related 

so nucleic acids, fragments of EST-related nucleic acids, 
positional segments of the EST-related nucleic acids, or 
fragments of positional segments of the EST-related nu- 
cleic acids and methods of making the EST-related 
polypeptides, fragments of EST-r lat d polypeptid s, 

55 positional segments of EST-relat d polypeptides, or 
fragm nts of EST-related polyp ptides. The m thods 
comprise sequentialJy linking together amino acids to 
produce the nucleic polyp ptides having the preceding 
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sequences. In some embodim nts, the polypeptides 
made by these methods are 150 amino acid or less in 
length. In other embodiments, the polypeptides made 
by these methods are 120 amino acids or less in length. 
[0513] A variety of methods of making polypeptides 
are known to those skilled in the art, including methods 
in which the carboxyl terminal amino acid is bound to 
polyvinyl benzene or another suitable resin. The amino 
acid to be added possesses blocking groups on its ami- 
no moiety and any side chain reactive groups so that 
only its carboxyl moiety can react. The carboxyl group 
is activated with carbodiimide or arK>ther activating 
agent and allowed to couple to the immobilized amino 
acid. After removal of the blocking group, the cycle is 
repeated to generate a polypeptide having the desired 
sequence. Alternatively, the methods described in U.S. 
Patent No. 5,049,656 may be used. 
[K14] As discussed above, the EST-related nucleic 
acids, fragments of the EST-related nucleic acids, posi- 
tional segments of the EST-related nucleic acids, or 
fragments of positional segments of the EST-related nu- 
cleic acids can be used for various purposes. The poly- 
nucleotides can be used to express recombinant protein 
for analysis, characterization or therapeutic use; pro- 
duction of secreted polypeptides or chimeric polypep- 
tides, antibody production, as markers for tissues in 
which the corresponding protein is preferentially ex- 
pressed (either constitutively or at a particular stage of 
tissue differentiation or development or in disease 
states); as molecular weight markers on Southern gels; 
as chrorDosome markers oirtags (when labeled) to iden- 
tify chromosomes or to map related gene positions; to 
compare with endogenous DNA sequences in patients 
to identify potential genetic disorders; as probes to hy- 
bridize and thus discover novel, related DNA sequenc- 
es; as a source of infomnation to derive PCR primers for 
genetic fingerprinting; for selecting and making oligom- 
ers for attachment to a "gene chip" or other support, in- 
cluding for examination for expression patterns; to raise 
anti-protein antibodies using DNA immunization tech- 
niques; and as an antigen to raise anti-DNA antibodies 
or elicit another immune response. Where the polynu- 
cleotide encodes a protein or polypeptide which binds 
or potentially binds to another protein or polypeptide 
(such as, for example, in a receptor-ligand interaction), 
the polynucleotide can also be used in interaction trap 
assays (such as, for example, that described in Gyuris 
et al, Ce// 75:791-803 (1993)) to identify polynucle- 
otides encoding the other protein or polypeptide with 
which binding occurs or to identify inhibitors of the bind- 
ing interaction. 

[0515] The proteins or polypeptides provided by the 
present invention can similarly be used in assays to de- 
t rmine biological activity, including in a panel of multipl 
prot ins for high-throughput scr ning; to raise antibod- 
ies or to elicit another immun r sponse; as a r agent 
(including the labeled r agent) in assays designed to 
quantitatively determine lev Is of the protein (or its re- 



ceptor) in biological fluids; as mark rs for tissues in 
which the corresponding prot in is pr f rentially ex- 
pressed (either constitutively or at a particular stage of 
tissue differentiation or development or in a disease 

5 state); and, of course, to isolate correlative receptors or 
ligands. Where the protein or polypeptide binds or po- 
tentially binds to another protein or polypeptkie (such 
as, for example^ in a receptor-ligand interaction), the 
protein can be used to identify the other protein with 
which binding occurs or to identify inhibitors of the bind- 
ing interaction. Proteins or polypeptides involved in 
these binding interactions can also be used to screen 
for peptide or small molecule inhibitors or agonists of 
the binding interaction. 

IS [0516] Any or all of these research utilities are capable 
of being developed into reagent grade or kit format for 
commercialization as research products. 
[0517] IWIethods for performing the uses listed above 
are well known to those skilled in the art. References 

20 disclosing such methods include without limitatkxi "Mo- 
lecular Cloning; A Laboratory Manual", 2d ed., Cold 
Spring Harbor Laboratory Press, Sambrook. J., E.F. 
Fritsch and T. Maniatis eds., 1989, and "Methods in En- 
zymology; Guide to Molecular Cloning Techniques", Ac- 

2S ademic Press. Berger, S.L and A.R. Kimmel eds., 1987. 
[0518] Polynucleotides and proteins or potypeptid s 
of the present invention can also be used as nutritional 
sources or supplements. Such uses include without lim- 
itatbn use as a protein or amino acid supplement, use 

30 as a carbon source, use as a nitrogen source and use 
as a source of carbohydrate. In such cases the protein 
or polynucleotide of the invention can be added to the 
feed of a particular organism or can be administered as 
a separate solid or liquid preparation, such as in the form 

35 of powder, pills, solutions, suspensions or capsules. In 
the case of microorganisms, the protein or polynucle- 
otide of the invention can be added to the medium in or 
on which the microorganism is cultured. 
[0519] Although this invention has been described in 

40 terms of certain preferred embodiments, other embodi- 
ments which will be apparent to those of ordinary skill 
in the art in view of the disclosure herein are also within 
the scope of this invention. Accordingly, the scope of th 
invention is intended to be defined only by reference to 

45 the appended claims. 



Claims 

so 1. A purified nucleic acid comprising a sequence se- 
lected from the group consisting of SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 and se- 
quences complementary to the sequences of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681 , 



55 



2- A purified nucleic acid comprising at I ast 10 con- 
secutiv nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
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SEQ ID NOs: 8178-36681 and sequ nces comple- 
mentary to the sequences of SEQ ID NOs: 24-41 00 
and SEQ ID NOs: 8178-36681. 

3. A purified nucleic acid comprising at least 15 con- 
secutive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 and sequences comple- 
mentary to the sequences of SEQ ID NOs: 24-4100 
and SEQ I D NOs: 81 78-36681 . 

4'. A purified nucleic acid comprising the coding se- 
quence of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100. 

5. A purified nucleic acid comprising the full coding se- 
quences of a sequence selected from the group 
consisting of SEQ ID NOs: 3721-3811 wherein the 
full coding sequence comprises the sequence en- 
coding the signal peptide and the sequence encod- 
ing the mature protein. 

6. A purified nucleic acid comprising a contiguous 
span of a sequence selected from the group con- 
sisting of SEQ ID NOs: 3721-3811 which encodes 
the mature protein. 

7. A purified nucleic acid comprising a contiguous 
span of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-652 and 3721-3811 
which encode the signal peptide. 

8. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 4101-8177. 

9. A purified nucleic acid encoding a polypeptide com- 
prising a sequence selected from the group consist- 
ing of the sequences of SEQ ID NOs: 7798-7888. 

10. A purified nucleic acid encoding a polypeptide com- 
prising a mature protein included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 7798-7888. 

11. A purified nucleic acid encodings polypeptide com- 
prising a signal peptide included in a sequence se- 
lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-4729 and 7798-7888. 

12. A purified nucleic acid at least 15 nucleotides in 
length which hybridizes under stringent conditions 
to a sequence selected from the group consisting 
of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681. 



13. A purifi d or isolated polypeptide comprising a se- 
quence selected from the group consisting of the 
sequences of SEQ ID NOs: 4101-8177. 

5 14. A purified or isolated polypeptide comprising a se- 
quence selected from the group consisting of SEQ 
ID NOs: 7798-7888. 

15. A purified or isolated polypeptide comprising a ma- 
10 ture protein of a polypeptide selected from the 

group consisting of SEQ ID NOs: 7798-7888. 

16. A purified or isolated polypeptide comprising a sig- 
nal peptide of a sequence selected from the group 

IS consisting of the polypeptides of SEQ ID NOs: 
41 01 -4729 and 7798-7888. 

17. A purified or isolated polypeptide comprising at 
least 1 0 consecutive amino acids of a sequence se- 

20 lected from the group consisting of the sequences 
of SEQ ID NOs: 4101-8177. 

18. A method of making a cDNA comprising the steps 
of: 

25 

contacting a collection of mRNA molecules 
from human cells with a primer. comprising at 
least 1 5 consecutive nucleotides of a sequence 
selected from the group consisting of the se- 

30 quences complernentary to SEQ ID NOs: 

24-4100 and SEQ ID NOs: 8178-36681; 
hybridizing said primer to an mRNA in said col- 
lection that encodes said protein; 
reverse transcribing said hybridized primer to 

35 make a first cDNA strand from said mRNA; 

nnaking a second cDNA strand complementary 
to said first cDNA strand; and 
isolating the resulting cDNA encoding said pro- 
tein comprising said first cDNA strand and said 

40 second cDNA strand. 

19. A purified cDNA obtainable by the method of Claim 
18. 

45 20. The cDNA of Claim 1 9 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

21. A method of making a cDNA comprising the steps 
of: 

so 

obtaining a cDNA comprising a sequence se- 
lected from the group consisting of SEQ ID 
NOs: 24-4100 and SEQ ID NOs: 8178-36681; 
contacting said cDNA with a d tectable probe 
55 comprising at I ast 15 consecutive nucleotides 

of a sequenc sel cted from the group consist- 
ing of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and the sequences complementa- 
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ry to SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 under conditions which p rmitsaid 
probe to hybridize to said cDNA; 
identifying a cDN A which hybridizes to said de- 
tectable probe; and 

isolating said cDNA which hybridizes to said 
probe. 

22. A purified cDN A obtainable by the method of Claim 
21. 

23. The cDNA of Claim 22 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

24. A method of making a cDNA comprising the steps 
of: 

contacting a collection of mRNA molecules 
from human cells with a first primer capable of 
hybridizing to the polyA tail of said mRNA; 
hybridizing said first primer to said polyA tail; 
reverse transcribing said mRNA to make a first 
cDNA strand; 

making a second cDNA strand complementary 
to said first cDNA strand using at least one 
primer comprising at least 1 5 consecutive nu- 
cleotides of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681; and 
isolating the resulting cDNA comprising said 
first cDNA strand and said second cDNA 
strand. 

25. A purified cDNA obtainable by the rhethod of Claim 
24. 

26. The cDNA of Claim 25 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

27. The method of Claim 24, wherein the second cDNA 
strand is made by: 

contacting said first cDNA strand with a first pair 
of primers, said first pair of primers comprising 
a second primer comprising at least 15 consec- 
utive nucleotides of a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 
and SEQ ID NOs: 81 78-36681 and a third prim- 
er having a sequence therein which is included 
within the sequence of said first primer; 
performing a first polymerase chain reaction 
with said first pair of primers to generate a first 
PCR product; 

contacting said first PCR product with a s cond 
pair of prim rs, said second pair of primers 
comprising a fourth prim r, said fourth primer 
comprising at least 1 5 consecutive nucleotides 
of said sequence selected from the group con- 
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sisting of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681, and a fifth primer, wher in 
said fourth and fifth hybridize to sequences 
within said first PCR product; and 
performing a second pxjlymerase chain reac- 
tion, thereby generating a second PCR prod- 
uct. 

28. A purified cDNA obtainable by the method of Claim 
27. 

29. The cDNA of Claim 28 wherein said cDNA encodes 
at least a portion of a human polypeptide. 

30. The method of Claim 24 wherein the second cDNA 
strand is made by: 

contacting said first cDNA strand with a second 
primer comprising at least 15 consecutive nu- 
cleotides of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681; 
hybridizing said second primer to said first 
strand cDNA; and 

extending said hybridized second primer to 
generate said second cDNA strand. 

31 . A purified cDNA obtainable by the method of Claim 
30. 

32. The cDNA of Claim 28. wherein said cDN A encodes 
at least a portion of a human polypeptide. 

33. A method of making a polypeptide comprising the 
steps of: 

obtaining a cDNA which encodes a polypeptide 
encoded by a nucleic acid comprising a se- 
quence selected from the group consisting of 
SEQ ID NOs: 24-4100 or a cDNA which en- 
codes a polypeptide comprising at least lOcon- 
secutive amino acids of a polypeptide encoded 
by a sequence selected from the group consist- 
ing of SEQ ID NOs: 24-4100; 
inserting said cDNA in an expression vector 
such that said cDN A is operably linked to a pro- 
moter, 

introducing said expression vector into a host 
cell whereby said host cell produces the protein 
encoded by said cDNA; and 
isolating said protein. 

34. An isolated protein obtainable by the method of 
Claim 33. 

35. A method of obtaining a promot r DNA comprising 
the steps of: 
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obtaining genomic DNA located upstream of a 
nucleic acid comprising a sequence selected 
from the group consisting of SEQ ID NOs: 
24-4100 and SEQ ID NOs: 8178-36681 and the 
sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681; 

screening said genomic DNA to identify a pro- 
moter capable of directing transcription initia- 
tion; and 

isolating said DNA comprising said identified 
promoter. 

36. The method of Claim 35. wherein said obtaining 
step comprises walking from genomic DNA com- 
prising a sequence selected from the group consist- 
ing of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and the sequences complementary to 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681. 

37. The method of Claim 36, wherein said screening 
step comprises inserting genomic DNA located up- 
stream of a sequence selected from the group con- 
sisting of SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and the sequences complementary to 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 into a promoter reporter vector 

38. The method of Claim 36. wherein said screening 
step comprises identifying motifs in genomic DNA 
located upstream of a sequence selected from the 
group consisting of SEQ ID NOs: 24-4100 and SEQ 
ID NOs: 8178-36681 and the sequences comple- 
mentary to SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 which are transcription factor binding 
sites or transcription start sites. 

39. An isolated promoter obtainable by the method of 
any one of Claims 34 to 38. 

40. In an array of discrete ESTs or fragments thereof of 
at least 15 nucleotides in length, the improvement 
comprising inclusion in said array of at least one se- 
quence selected from the group consisting of SEQ 
ID NOs: 24-4100 and SEQ ID NOs: 8178-36681, 
the sequences complementary to the sequences of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 and fragments comprising at least 15 
consecutive nucleotides of said sequence. 

41 . The array of Claim 40 including therein at least two 
sequences selected from the group consisting of 
SEQ ID NOs: 24-4100 and SEQ ID NOs: 
8178-36681 , the sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681, and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 



s. 

42. The array of Claim 40 including therein at least five 
sequences selected from the group consisting of 

5 SEQ ID NOs: 24-4100 and SEQ ID NOs: 

8178-36681, the sequences complementary to the 
sequences of SEQ ID NOs: 24-4100 and SEQ ID 
NOs: 8178-36681 and fragments comprising at 
least 15 consecutive nucleotides of said sequenc- 

10 es. 

43. An enriched population of recombinant nucleic ac- 
ids, said recombinant nucleic acids comprising an 
insert nucleic acid and a backbone nucleic acid. 

IS wherein at least 5% of said insert nucleic acids in 
said population comprise a sequence selected from 
the group consisting of SEQ ID NOs: 24-4100 and 
SEQ ID NOs: 8178-36681 and the sequences com- 
plementary to SEQ ID NOs: 24-4100 and SEQ ID 

20 NOs: 8178-36681. 

44. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising a sequence 
selected from the group consisting of SEQ ID NOs: 

2S 4101-8177. 

45. A purified or isolated antibody capable of specifical- 
ly binding to a polypeptide comprising at least 10 
consecutive amino acids of a sequence selected 

30 from the group consisting of SEQ ID NOs: 
4101-8177. 

46. An antibody composition capable of selectively 
binding to an epitope-containing fragment of a 

35 polypeptide comprising a contiguous span of at 
least 8 amino acids of any of SEQ ID NOs: 
4101-8177, wherein said antibody is polyclonal or 
monoclonal. 

40 47. A computer readable medium having stored there- 
on a sequence selected from the group consisting 
of a nucleic acid code of SEQ ID NOs: 24-41 00 and 
8178-36681 and a polypeptide code of SEQ ID 
NOs: 4101-8177. 

45 

48. A computer system comprising a processor and a 
data storage device wherein said data storage de- 
vice has stored thereon a sequence selected from 
the group consisting of a nucleic acid code of SE- 

so Q\ D NOs: 24-41 00 and 81 78-36681 and a polyp p- 
tide code of SEQ ID NOs: 4101-8177. 

49. The computer system of Claim 48 further compris- 
ing a sequ nc compar r and a data storag device 

ss having r f renc s quences stored thereon. 

50. The computer system of Claim 49 wherein said se- 
quence comparer comprises a computer program 
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which indicat s polymorphisms. 

51. The computer system of Claim 48 further compris- 
ing an identifier which identifies features in said se- 
quence. 

52. A method for comparing a first sequence to a refer- 
ence sequence wherein sard first sequence is se- 
lected from the group consisting of a nucleic acid 
code of SEQID NOs: 24-4100 and 8178-36681 and 
a polypeptide code of SEQ ID NOs: 4101-8177 
comprising the steps of: 

reading said first sequence and said reference 
sequience through use of a computer program 
which compares sequences; and 
determining differences between said first se- 
quence and said reference sequence with said 
computer program. 

53. The method of Claim 52, wherein said step of de- 
termining differences between the first sequence 
and the reference sequence comprises identifying 
polymorphisms. 

54. A method for identifying a feature in a sequence se- 
lected from the group consisting of a nucleic acid 
code of SEQID NOs: 24-4100 and 8178-36681 and 
a polypeptide code of SEQ ID NOs: 4101-8177 
comprising the steps of: 

reading said sequence through the use of a 
computer program which identifies features in 
sequences; and 

identifying features in said sequence with said 
computer program. 

55. A vector comprising a nucleic acid according to any 
one of Claims 1 to 12. 

56. A host cell containing a nucleic acid of Claim 55. 

57. A method of making a nucleic acid of Claims 1 com- 
prising the steps of: 

introducing said nucleic acid into a host cell 
such that said nucleic acid is present in multiple 
copies in each host cell; and 
isolating said nucleic acid from said host cell. 

58. A method of making a nucleic acid of any one of 
Claims 1 to 12 comprising the step of sequentially 
linking together the nucleotides in said nucleic ac- 
ids. 

59. A method of making a polypeptid of any one of 
Claims 13 to 17 wherein said polypeptid s is 150 
amino acids in length or less comprising the step of 



sequ ntially linking together th amino acids in said 
polypeptides. 

60. A method of making a polypeptide of any one of 
Claims 13 to 17 wherein said polypeptides is 120 
amino acids in length or less comprising the step of 
sequentially linking together the amino acids in said 
polypeptides. 
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Minimum 
signal 
peptide score 


false positive 
rate 


false 
negative rate 


prot>a(0.1) 


proba(0^} 


3.5 


0,121 


0.036 


0.467 


0.664 


A. 


0.096 


0.06 


0.519 


0.708 


4.5 


0.078 


0.079 


0.56S 


0.745 


5 


0.062 


0.098 


0.615 


0.782 


5.5 


O.OS 


0.127 


0.659 


0,813 


6 


0.04 


0.163 


0.694 


0.836 


6.5 


0.033 


0.202 


0.725 


0.855 


7 


0.025 


0.246 


0.763 


0.878 


7.5 


0.021 


0.304 


0.78 


• 0.889 


8 


0.015 


0.368 


0.816 


0.909 


8.5 


0.012 


0.418 


0.836 


0.92 


9 


0.009 


0.512 


0.856 


0.93 


9.5 


0.007 


0.581 


0.863 


0.934 


10 


0.006 


0.579 


0.835 


0.919 
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Description of Transcription Factor Binding Sites present on promoters isolated from 
SignalTag sequences 



Matrix 

CMYB_01 
MYOD_06 
S8_01 
S8_01 

DELTAEFl 01 

GATA_C 

CMYB_Ol 

GATAl_02 

GATA_C 

TAL1ALPHAE47 01 

TAL1BETAE47_01 

TALlBETAITF2_0l 

MYOD 06 

GATAf 04 

IK1_01 

IIC2_01 

CREL_01 

GATAl 02 

SRY 02' 

E2F_02 

MZFl 01 



Matrix 

NFY_Q6 

MZFl_01 

CMYB_Ol 

VMYB_02 

STAT_01 

STAT_Ol 

MZF1_01 

IK2 01 

MZFI.Ol 

SRY_02 

MZF1_01 

MYOD_Q6 

DELTAEF1_01 

S8_01 

MZFl 01 





Orientation 


Score 


Length 


Sequence 


-502 




0.983 


9 


TGTCAGTTG 


-5U1 




A OA 1 


1 *j 


PCrAACTGAC 


-444 




0.960 


11 


AATAGAATTAG 


-425 


+ 


0.966 


11 


AACTAAATTAG 


-390 




0 960 


11 


GCACACCTCAG 


-364 




0.964 


11 


AGATAAATCCA 


-349 




0.958 


9 


CTTCAGTTG 


-343 


+ 


0.959 


14 


ttgtagataggaca 


-339 


4- 


0.953 


1) 


AGATAGGACAT 


-235 


+ 


0.973 


16 


cataacagatggtaag 


-235 


+ 


0.983 


16 


cataacagatggtaag 


-235 


+ 


0-978 


16 


cataacagatggtaag 


-232 




0.954 


10 


accatctgtt 


-217 




0,953 


13 


tcaagataaagta 


-126 




0 963 


13 


agttgggaattcc 


-126 




0 985 


12 


agttgggaattc 


-123 


+ 


0.962 


10 


tgggaattcc 


-96 




0.950 


14 


tcagtgatatggca 






0.951 


12 


taaaacaaaaca 


-33 


+ 


0.957 


8 


tttagcgc 


-5 




0.975 


8 


tgagggga 


;B4 (861 bp): 








Sequence 


Position 


Orientaiion 


Score 


Length 


-748 




0.956 


11 


ggaccaatcat 


-738 




0.962 


8 


cctgggga 


-684 


+ 


0.994 


9 


tgaccgttg 


-682 




0.985 


9 


tccaacogt 


-673 


+ 


0-968 


9 


ttcctggaa 


-673 




0.951 


9 


ttccaggaa 


-556 




0.956 


8 


ttggggga 


^51 


+ 


0.965 


12 


gaatgggatttc 


^24 




0.986 


8 


agagggga 


-398 




0,955 


12 


gaaa.^caaaaca 


-216 




0.960 


8 


GAAGGGGA 


-190 


+ 


0.981 


10 


agcatctgcc 


-176 




0.958 


11 


tcccaccttcc 


5 




0.992 


11 


gaggcaattat 


16 




0.986 


8 


agagggga 



Promoter sequence P29B6 (555 bp): 






Length 


Matrix 


Position 


Orientation 


Score 


ARNT 01 


-311 




0.964 


16 


NMYC 01 


-309 


+ 


0965 


12 


USF 01 


-309 


-»- 


0.985 


12 


USF 01 


-309 




0.985 


12 


NMYC 01 


-309 




0956 


12 


MYCMAX 02 


-309 


* 


0972 


12 


USF C 


307 


-4- 


0.997 


8 


USF C 


-307 




0.991 


8 


MZFl 01 


-292 




0.968 


8 


ELKl 02 


-105 




0.963 


14 


CETS1P54 01 


-102 




0.974 


10 


API Q4 


-42 




0.963 


11 


APIFJ Q2 


-42 




0.961 


11 


PADS C 


45 


+ 


1,000 


9 



Sequence 

ggactcacgtgctgct 

actcacgtgctg 

actcacgtgctg 

cagcacgtgagt 

cagcacgtgagt 

cagcacgtgagt 

tcacgtgc 

gcacgtga 

catgggga 

ctctccggaagcct 

tccggaagoc 

agtgactgaac 

agtgactgaac 

TGTGGTCrC 



Location ^in: 
SEQJDNO 

17-25 

con^lexnent 

con^lement 

94-104 

complenocnt 

conq}lenient 

170-178 

176-189 

180-190 

284-299 

284-299 

284-299 

complement 

complement 

393-405 

393-404 

396-405 

423-436 

complement 

486-493 

conr^tement 



: 17 

of lS-27 
of 75-85 

of 129-139 
of 155065 



of 287-296 
of 302-3 14 



of 478-489 
of 5 14-521 



Location in: 
SEQ ID NO: 20 
complement of 60-70 
70-77 
124-132 

complement of 126-134 
135-143 
coix^lemcnt 
complement 
357-368 
384-391 
conrq>lcment 
592-599 
618-627 
632-642 
con^lemcnt 
connplement 



of 135-143 
of 252-259 



of 4 1 0-421 



of 8 13-823 
of 824-831 



Location in 
SEQ ID NO 
191-206 
193-204 
193-204 
complement 
complement 
complement 
195-202 
complement 
coRq>leiTient 
397-410 
400-409 
comptendent 
comptemcnt 
547-555 



:23 



of 193-204 
ofl 93-204 
on93-204 

of 195-202 
of2I0-217 



of 460-470 
of 460470 



FIGURE 5 
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